Yongming Rao
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
arXiv 2026
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
arXiv 2026
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
arXiv 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
arXiv 2025
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
ICCV 2025
Ola: Pushing the Frontiers of Omni-Modal Language Model
arXiv 2025
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
arXiv 2024
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
CVPR 2025 1
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
arXiv 2024
Unleashing Text-to-Image Diffusion Models for Visual Perception
unleashing-text-to-image-diffusion-models-for
Generative Multimodal Models are In-Context Learners
CVPR 2024 1
UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
unipc-a-unified-predictor-corrector-framework
TCOVIS: Temporally Consistent Online Video Instance Segmentation
ICCV 2023 1
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
CVPR 2024 1
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
ICCV 2023 1
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
arXiv 2022
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
arXiv 2022
SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
CVPR 2022 1
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
CVPR 2022 1
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
CVPR 2022 1
Affiliations
Frequent co-authors
10from 20 papers