0

Deli Zhao

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

RynnBrain: Open Embodied Foundation Models

arXiv 2026

2026

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

arXiv 2025

2025

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

arXiv 2025

2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

arXiv 2025

2025

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

arXiv 2025

2025

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

arXiv 2025

2025

RynnVLA-002: A Unified Vision-Language-Action and World Model

arXiv 2025

2025

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

arXiv 2025

2025

WorldVLA: Towards Autoregressive Action World Model

arXiv 2025

2025

RynnEC: Bringing MLLMs into Embodied World

arXiv 2025

2025

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

arXiv 2025

2025

FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving

arXiv 2025

2025

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

arXiv 2025

2025

STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

arXiv 2025

2025

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

arXiv 2025

2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

arXiv 2024

2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

arXiv 2024

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025 1

2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

arXiv 2024

2024

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents

arXiv 2024

2024

Auto-Arena: Automating LLM Evaluations with Agent Peer Battles and Committee Discussions

arXiv 2024

2024

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models

arXiv 2023

2023

Composer: Creative and Controllable Image Synthesis with Composable Conditions

arXiv 2023

2023

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

ICCV 2023 1

2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

ICCV 2023 1

2023

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers