Siteng Huang
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17RynnBrain: Open Embodied Foundation Models
arXiv 2026
Text-Only Data Synthesis for Vision Language Model Training
arXiv 2025
MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation
arXiv 2026
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
arXiv 2025
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
arXiv 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
arXiv 2025
Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration
arXiv 2025
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
arXiv 2025
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
arXiv 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
arXiv 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
arXiv 2025
WorldVLA: Towards Autoregressive Action World Model
arXiv 2025
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
arXiv 2024
Accelerating Diffusion Transformers with Token-wise Feature Caching
arXiv 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
arXiv 2024
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
CVPR 2024 1
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
CVPR 2023 1
Affiliations
Frequent co-authors
10from 17 papers