Yilun Chen
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
arXiv 2026
MM-ACT: Learn from Multimodal Parallel Generation to Act
arXiv 2025
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
arXiv 2025
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
arXiv 2025
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
arXiv 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
CVPR 2025 1
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
arXiv 2025
GRUtopia: Dream General Robots in a City at Scale
arXiv 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
arXiv 2024
Grounded 3D-LLM with Referent Tokens
arXiv 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
arXiv 2024
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
arXiv 2024
PointLLM: Empowering Large Language Models to Understand Point Clouds
arXiv 2023
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
arXiv 2023
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
arXiv 2023
VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection
arXiv 2023
Affiliations
Frequent co-authors
10from 17 papers