Rui Shao
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records
arXiv 2026
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
arXiv 2026
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
arXiv 2025
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
ICCV 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
CVPR 2025 1
HiconAgent: History Context-aware Policy Optimization for GUI Agents
arXiv 2025
CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
arXiv 2025
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
CVPR 2025 1
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
arXiv 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
arXiv 2024
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
arXiv 2024
Detecting and Grounding Multi-Modal Media Manipulation
CVPR 2023 1
Affiliations
Frequent co-authors
10from 12 papers