Xin Eric Wang
- Papers
- 30
Cite
Notes
Only stored in your browser.
Authored papers
30Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants
arXiv 2026
Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey
arXiv 2026
Auditing Agent Harness Safety
arXiv 2026
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
arXiv 2026
SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration
arXiv 2026
On the Reliability of Computer Use Agents
arXiv 2026
Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
arXiv 2026
The Unreasonable Effectiveness of Scaling Agents for Computer Use
arXiv 2025
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
arXiv 2025
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
arXiv 2025
GRIT: Teaching MLLMs to Think with Images
arXiv 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
arXiv 2025
Agents of Change: Self-Evolving LLM Agents for Strategic Planning
arXiv 2025
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
arXiv 2025
Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
arXiv 2025
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
arXiv 2024
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
arXiv 2024
Multimodal Situational Safety
arXiv 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
arXiv 2024
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
arXiv 2024
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
arXiv 2023
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
NeurIPS 2023 11
Multimodal Procedural Planning via Dual Text-Image Prompting
arXiv 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
llmscore-unveiling-the-power-of-large
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
arXiv 2023
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models
arXiv 2023
Parameter-efficient Model Adaptation for Vision Transformers
arXiv 2022
Imagination-Augmented Natural Language Understanding
NAACL 2022 7
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
arXiv 2022
ComCLIP: Training-Free Compositional Image and Text Matching
arXiv 2022
Affiliations
Frequent co-authors
10from 30 papers