Yueting Zhuang
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39Self-Distilled Agentic Reinforcement Learning
arXiv 2026
InstructSAM: Segment Any Instance with Any Instructions
arXiv 2026
GroundAct: Can LLM Agents Ground Actions in Environmental States?
arXiv 2025
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
arXiv 2026
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
arXiv 2026
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
arXiv 2026
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
arXiv 2026
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
arXiv 2026
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
arXiv 2026
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
arXiv 2026
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
arXiv 2026
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
arXiv 2025
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
arXiv 2025
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation
arXiv 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
arXiv 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
arXiv 2025
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
arXiv 2025
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
arXiv 2025
Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
arXiv 2025
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
arXiv 2025
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
arXiv 2025
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
arXiv 2025
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
arXiv 2025
Hierarchical Budget Policy Optimization for Adaptive Reasoning
arXiv 2025
Let LLMs Break Free from Overthinking via Self-Braking Tuning
arXiv 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
arXiv 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
arXiv 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
arXiv 2024
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
arXiv 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
arXiv 2024
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
arXiv 2024
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation
arXiv 2024
Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering
arXiv 2024
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
arXiv 2023
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
CVPR 2024 1
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023 1
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
arXiv 2023
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
arXiv 2019
Affiliations
Frequent co-authors
10from 39 papers