Jun Xiao
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32Self-Distilled Agentic Reinforcement Learning
arXiv 2026
InstructSAM: Segment Any Instance with Any Instructions
arXiv 2026
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
arXiv 2026
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
arXiv 2026
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
arXiv 2026
KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
arXiv 2026
GroundAct: Can LLM Agents Ground Actions in Environmental States?
arXiv 2025
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
arXiv 2026
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
arXiv 2026
DREAM: Where Visual Understanding Meets Text-to-Image Generation
arXiv 2026
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
arXiv 2026
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
arXiv 2025
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning
arXiv 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
CVPR 2025 1
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
arXiv 2025
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
arXiv 2025
GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
arXiv 2025
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
arXiv 2025
Hierarchical Budget Policy Optimization for Adaptive Reasoning
arXiv 2025
Let LLMs Break Free from Overthinking via Self-Braking Tuning
arXiv 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
arXiv 2025
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
arXiv 2025
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
arXiv 2025
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
arXiv 2024
ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models
arXiv 2024
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
arXiv 2024
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering
arXiv 2024
Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning
arXiv 2024
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
arXiv 2023
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
arXiv 2023
Compositional Feature Augmentation for Unbiased Scene Graph Generation
ICCV 2023 1
Unified Normalization for Accelerating and Stabilizing Transformers
arXiv 2022
Affiliations
Frequent co-authors
10from 32 papers