0

Yuntao Chen

Papers
17

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
17papers

Authored papers

17

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

arXiv 2026

2026

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

arXiv 2026

2026

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

arXiv 2025

2025

Unified Vision-Language-Action Model

arXiv 2025

2025

Multi-Agent Tool-Integrated Policy Optimization

arXiv 2025

2025

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

ICCV 2025

2025

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

CVPR 2024 1

2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

arXiv 2024

2024

OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction

arXiv 2024

2024

Monocular Occupancy Prediction for Scalable Indoor Scenes

arXiv 2024

2024

Enhancing End-to-End Autonomous Driving with Latent World Model

arXiv 2024

2024

Diffusion Transformer Policy

arXiv 2024

2024

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

arXiv 2023

2023

Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection

ICCV 2023 1

2023

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

CVPR 2024 1

2023

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

CVPR 2023 1

2023

Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

CVPR 2024 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 17 papers