0

Shuhuai Ren

Papers
21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
21papers

Authored papers

21

MiMo-V2-Flash Technical Report

arXiv 2026

2026

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

arXiv 2025

2025

MiMo-VL Technical Report

arXiv 2025

2025

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

arXiv 2025

2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

arXiv 2025

2025

Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling

arXiv 2025

2025

MiMo-Embodied: X-Embodied Foundation Model Technical Report

arXiv 2025

2025

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

arXiv 2025

2025

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

arXiv 2025

2025

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

ICCV 2025

2025

TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment

arXiv 2025

2025

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

arXiv 2024

2024

Parallelized Autoregressive Visual Generation

CVPR 2025 1

2024

TempCompass: Do Video LLMs Really Understand Videos?

arXiv 2024

2024

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

arXiv 2024

2024

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

arXiv 2024

2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

CVPR 2024 1

2023

Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

prompt-pre-training-with-twenty-thousand

2023

TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

arXiv 2023

2023

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

arXiv 2023

2023

Delving into the Openness of CLIP

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 21 papers