Yuntao Chen
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark
arXiv 2026
GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
arXiv 2026
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
arXiv 2025
Unified Vision-Language-Action Model
arXiv 2025
Multi-Agent Tool-Integrated Policy Optimization
arXiv 2025
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
ICCV 2025
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024 1
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
arXiv 2024
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction
arXiv 2024
Monocular Occupancy Prediction for Scalable Indoor Scenes
arXiv 2024
Enhancing End-to-End Autonomous Driving with Latent World Model
arXiv 2024
Diffusion Transformer Policy
arXiv 2024
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
arXiv 2023
Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection
ICCV 2023 1
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
CVPR 2024 1
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
CVPR 2023 1
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024 1
Affiliations
Frequent co-authors
10from 17 papers