0

Wei Liu

Papers
84

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
84papers

Authored papers

84

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

arXiv 2026

2026

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

arXiv 2026

2026

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

arXiv 2026

2026

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

arXiv 2026

2026

LongCat-Flash-Thinking-2601 Technical Report

arXiv 2026

2026

RealWonder: Real-Time Physical Action-Conditioned Video Generation

arXiv 2026

2026

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

arXiv 2026

2026

Mobile GUI Agents under Real-world Threats: Are We There Yet?

arXiv 2026

2026

Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision

arXiv 2026

2026

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

arXiv 2026

2026

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

arXiv 2026

2026

DanceGRPO: Unleashing GRPO on Visual Generation

arXiv 2025

2025

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

arXiv 2025

2025

MiMo-VL Technical Report

arXiv 2025

2025

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

arXiv 2025

2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

arXiv 2025

2025

Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

arXiv 2025

2025

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

arXiv 2025

2025

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization

arXiv 2025

2025

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

arXiv 2025

2025

EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation

arXiv 2025

2025

SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery

arXiv 2025

2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

arXiv 2025

2025

AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement

arXiv 2025

2025

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

arXiv 2025

2025

UQ: Assessing Language Models on Unsolved Questions

arXiv 2025

2025

GCPO: When Contrast Fails, Go Gold

arXiv 2025

2025

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study

arXiv 2025

2025

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

arXiv 2025

2025

Think-J: Learning to Think for Generative LLM-as-a-Judge

arXiv 2025

2025

AIR: Complex Instruction Generation via Automatic Iterative Refinement

arXiv 2025

2025

XRAG: Cross-lingual Retrieval-Augmented Generation

arXiv 2025

2025

Multi-Agent Collaboration via Cross-Team Orchestration

arXiv 2024

2024

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

CVPR 2025 1

2024

Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

arXiv 2024

2024

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

arXiv 2024

2024

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

arXiv 2024

2024

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

arXiv 2024

2024

MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation

arXiv 2024

2024

Autonomous Agents for Collaborative Task under Information Asymmetry

arXiv 2024

2024

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

arXiv 2024

2024

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

arXiv 2024

2024

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

arXiv 2024

2024

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images

arXiv 2024

2024

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

arXiv 2024

2024

Large Language Models are In-Context Molecule Learners

arXiv 2024

2024

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

arXiv 2024

2024

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

arXiv 2024

2024

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

arXiv 2024

2024

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

arXiv 2024

2024

X-MOBILITY: End-To-End Generalizable Navigation via World Modeling

arXiv 2024

2024

Analysing The Impact of Sequence Composition on Language Model Pre-Training

arXiv 2024

2024

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

arXiv 2024

2024

NFT1000: A Cross-Modal Dataset for Non-Fungible Token Retrieval

arXiv 2024

2024

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

arXiv 2024

2024

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

arXiv 2024

2024

KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models

arXiv 2024

2024

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

arXiv 2023

2023

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

arXiv 2023

2023

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

arXiv 2023

2023

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

arXiv 2023

2023

BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks

arXiv 2023

2023

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding

arXiv 2023

2023

DrugAssist: A Large Language Model for Molecule Optimization

arXiv 2023

2023

Plug-and-Play Regulators for Image-Text Matching

arXiv 2023

2023

SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

arXiv 2023

2023

Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

arXiv 2023

2023

Masked Autoencoders for Point Cloud Self-supervised Learning

arXiv 2022

2022

Curriculum-based Asymmetric Multi-task Reinforcement Learning

arXiv 2022

2022

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

arXiv 2022

2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

arXiv 2022

2022

Deep Face Restoration: A Survey

arXiv 2022

2022

Egocentric Video-Language Pretraining

arXiv 2022

2022

MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions

arXiv 2022

2022

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

crossformer-a-versatile-vision-transformer-1

2021

MC-Blur: A Comprehensive Benchmark for Image Deblurring

arXiv 2021

2021

UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction

Findings (ACL) 2021 8

2021

Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

arXiv 2020

2020

Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty

arXiv 2020

2020

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

self-supervised-spatio-temporal-1

2019

In Conclusion Not Repetition: Comprehensive Abstractive Summarization With Diversified Attention Based On Determinantal Point Processes

in-conclusion-not-repetition-comprehensive-1

2019

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

arXiv 2019

2019

Frustum PointNets for 3D Object Detection from RGB-D Data

frustum-pointnets-for-3d-object-detection-1

2017

SSD: Single Shot MultiBox Detector

arXiv 2015

2015

Affiliations

No known affiliations.

Frequent co-authors

10

from 84 papers