0

Xin Eric Wang

Papers
30

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
30papers

Authored papers

30

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

arXiv 2026

2026

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

arXiv 2026

2026

Auditing Agent Harness Safety

arXiv 2026

2026

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

arXiv 2026

2026

SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration

arXiv 2026

2026

On the Reliability of Computer Use Agents

arXiv 2026

2026

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

arXiv 2026

2026

The Unreasonable Effectiveness of Scaling Agents for Computer Use

arXiv 2025

2025

Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

arXiv 2025

2025

More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

arXiv 2025

2025

GRIT: Teaching MLLMs to Think with Images

arXiv 2025

2025

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

arXiv 2025

2025

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

arXiv 2025

2025

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

arXiv 2025

2025

Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs

arXiv 2025

2025

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

arXiv 2024

2024

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

arXiv 2024

2024

Multimodal Situational Safety

arXiv 2024

2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

arXiv 2024

2024

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

arXiv 2024

2024

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

arXiv 2023

2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

NeurIPS 2023 11

2023

Multimodal Procedural Planning via Dual Text-Image Prompting

arXiv 2023

2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

llmscore-unveiling-the-power-of-large

2023

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

arXiv 2023

2023

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

arXiv 2023

2023

Parameter-efficient Model Adaptation for Vision Transformers

arXiv 2022

2022

Imagination-Augmented Natural Language Understanding

NAACL 2022 7

2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

arXiv 2022

2022

ComCLIP: Training-Free Compositional Image and Text Matching

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 30 papers