Huan Sun
- Papers
- 37
Cite
Notes
Only stored in your browser.
Authored papers
37QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
arXiv 2026
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv 2026
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
arXiv 2026
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
arXiv 2026
Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation
arXiv 2026
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
arXiv 2026
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
arXiv 2025
An Illusion of Progress? Assessing the Current State of Web Agents
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
arXiv 2025
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
arXiv 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
arXiv 2025
Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging
arXiv 2025
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
arXiv 2025
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
arXiv 2024
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
arXiv 2024
GPT-4V(ision) is a Generalist Web Agent, if Grounded
arXiv 2024
ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving
arXiv 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
arXiv 2024
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
arXiv 2024
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
arXiv 2024
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
arXiv 2024
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
arXiv 2024
eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
arXiv 2024
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
arXiv 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
arXiv 2024
AttributionBench: How Hard is Automatic Attribution Evaluation?
arXiv 2024
Mind2Web: Towards a Generalist Agent for the Web
mind2web-towards-a-generalist-agent-for-the
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024 1
AgentBench: Evaluating LLMs as Agents
arXiv 2023
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
NeurIPS 2023 11
Biomedical Language Models are Robust to Sub-optimal Tokenization
arXiv 2023
Automatic Evaluation of Attribution by Large Language Models
arXiv 2023
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
arXiv 2022
Iteratively Prompt Pre-trained Language Models for Chain of Thought
arXiv 2022
TURL: Table Understanding through Representation Learning
arXiv 2020
StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow
arXiv 2018
Affiliations
Frequent co-authors
10from 37 papers