Wei Liu
- Papers
- 84
Cite
Notes
Only stored in your browser.
Authored papers
84Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
arXiv 2026
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
arXiv 2026
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration
arXiv 2026
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
arXiv 2026
LongCat-Flash-Thinking-2601 Technical Report
arXiv 2026
RealWonder: Real-Time Physical Action-Conditioned Video Generation
arXiv 2026
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
arXiv 2026
Mobile GUI Agents under Real-world Threats: Are We There Yet?
arXiv 2026
Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision
arXiv 2026
RM-Distiller: Exploiting Generative LLM for Reward Model Distillation
arXiv 2026
Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models
arXiv 2026
DanceGRPO: Unleashing GRPO on Visual Generation
arXiv 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
arXiv 2025
MiMo-VL Technical Report
arXiv 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
arXiv 2025
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
arXiv 2025
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
arXiv 2025
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
arXiv 2025
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization
arXiv 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
arXiv 2025
EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation
arXiv 2025
SKEL-CF: Coarse-to-Fine Biomechanical Skeleton and Surface Mesh Recovery
arXiv 2025
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
arXiv 2025
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
arXiv 2025
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
arXiv 2025
UQ: Assessing Language Models on Unsolved Questions
arXiv 2025
GCPO: When Contrast Fails, Go Gold
arXiv 2025
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
arXiv 2025
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
arXiv 2025
Think-J: Learning to Think for Generative LLM-as-a-Judge
arXiv 2025
AIR: Complex Instruction Generation via Automatic Iterative Refinement
arXiv 2025
XRAG: Cross-lingual Retrieval-Augmented Generation
arXiv 2025
Multi-Agent Collaboration via Cross-Team Orchestration
arXiv 2024
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
CVPR 2025 1
Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models
arXiv 2024
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation
arXiv 2024
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
arXiv 2024
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
arXiv 2024
MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
arXiv 2024
Autonomous Agents for Collaborative Task under Information Asymmetry
arXiv 2024
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
arXiv 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
arXiv 2024
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
arXiv 2024
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
arXiv 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
arXiv 2024
Large Language Models are In-Context Molecule Learners
arXiv 2024
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations
arXiv 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
arXiv 2024
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
arXiv 2024
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
arXiv 2024
X-MOBILITY: End-To-End Generalizable Navigation via World Modeling
arXiv 2024
Analysing The Impact of Sequence Composition on Language Model Pre-Training
arXiv 2024
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
arXiv 2024
NFT1000: A Cross-Modal Dataset for Non-Fungible Token Retrieval
arXiv 2024
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback
arXiv 2024
Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model
arXiv 2024
KG-Retriever: Efficient Knowledge Indexing for Retrieval-Augmented Large Language Models
arXiv 2024
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
arXiv 2023
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
arXiv 2023
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
arXiv 2023
GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions
arXiv 2023
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
arXiv 2023
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
arXiv 2023
DrugAssist: A Large Language Model for Molecule Optimization
arXiv 2023
Plug-and-Play Regulators for Image-Text Matching
arXiv 2023
SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
arXiv 2023
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA
arXiv 2023
Masked Autoencoders for Point Cloud Self-supervised Learning
arXiv 2022
Curriculum-based Asymmetric Multi-task Reinforcement Learning
arXiv 2022
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing
arXiv 2022
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
arXiv 2022
Deep Face Restoration: A Survey
arXiv 2022
Egocentric Video-Language Pretraining
arXiv 2022
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
arXiv 2022
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
crossformer-a-versatile-vision-transformer-1
MC-Blur: A Comprehensive Benchmark for Image Deblurring
arXiv 2021
UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction
Findings (ACL) 2021 8
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
arXiv 2020
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty
arXiv 2020
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
self-supervised-spatio-temporal-1
In Conclusion Not Repetition: Comprehensive Abstractive Summarization With Diversified Attention Based On Determinantal Point Processes
in-conclusion-not-repetition-comprehensive-1
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
arXiv 2019
Frustum PointNets for 3D Object Detection from RGB-D Data
frustum-pointnets-for-3d-object-detection-1
SSD: Single Shot MultiBox Detector
arXiv 2015
Affiliations
Frequent co-authors
10from 84 papers