Yi Liu
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
arXiv 2026
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
arXiv 2025
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
arXiv 2025
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
arXiv 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
Enabling Versatile Controls for Video Diffusion Models
arXiv 2025
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
arXiv 2025
Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training
arXiv 2025
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
arXiv 2024
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads
arXiv 2024
TempCompass: Do Video LLMs Really Understand Videos?
arXiv 2024
CAMixerSR: Only Details Need More "Attention"
CVPR 2024 1
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
arXiv 2024
Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
arXiv 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
arXiv 2024
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
arXiv 2024
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
arXiv 2024
DETRs Beat YOLOs on Real-time Object Detection
CVPR 2024 1
PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices
arXiv 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024 1
Semi-Offline Reinforcement Learning for Optimized Text Generation
arXiv 2023
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
arXiv 2023
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
CVPR 2023 1
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
arXiv 2023
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
dense-to-sparse-gate-for-mixture-of-experts
Affiliations
Frequent co-authors
10from 26 papers