Deli Zhao
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26RynnBrain: Open Embodied Foundation Models
arXiv 2026
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
arXiv 2025
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
arXiv 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
arXiv 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
arXiv 2025
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
arXiv 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
arXiv 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
arXiv 2025
WorldVLA: Towards Autoregressive Action World Model
arXiv 2025
RynnEC: Bringing MLLMs into Embodied World
arXiv 2025
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
arXiv 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
arXiv 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
arXiv 2025
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs
arXiv 2025
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
arXiv 2025
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
arXiv 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
arXiv 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
arXiv 2024
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
arXiv 2024
Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions
arXiv 2024
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
arXiv 2023
Composer: Creative and Controllable Image Synthesis with Composable Conditions
arXiv 2023
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
ICCV 2023 1
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
ICCV 2023 1
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
ICCV 2023 1
Affiliations
Frequent co-authors
10from 26 papers