Shuyan Zhou
Assistant professor at Duke CS; first-author of WebArena, the realistic web-agent benchmark; previously at Meta GenAI on Llama computer-use.
- Role
- professor
- Currently at
- Duke University Computer Science
- twitter.com/shuyanzhxyc
- GitHub
- github.com/shuyanzhou
- Scholar
- scholar.google.com/citations
- Papers
- 22
Cite
Notes
Only stored in your browser.
Authored papers
22Learning Personalized Agents from Human Feedback
arXiv 2026
Classroom Final Exam: An Instructor-Tested Reasoning Benchmark
arXiv 2026
Modeling Distinct Human Interaction in Web Agents
arXiv 2026
OSWorld-Verified: A Cleaner, More Reliable Computer-Use Benchmark
blog
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
arXiv 2025
The Geometry of Reasoning: Flowing Logics in Representation Space
arXiv 2025
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
arXiv 2025
FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback
arXiv 2025
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
NeurIPS
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
ACL
WebArena: A Realistic Web Environment for Building Autonomous Agents
ICLR
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
arXiv 2024
WebCanvas: Benchmarking Web Agents in Online Environments
arXiv 2024
Beyond Browsing: API-Based Web Agents
arXiv 2024
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
arXiv 2024
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
arXiv 2023
Causal Reasoning of Entities and Events in Procedural Texts
arXiv 2023
DocPrompting: Generating Code by Retrieving the Docs
arXiv 2022
Language Models of Code are Few-Shot Commonsense Learners
arXiv 2022
Execution-Based Evaluation for Open-Domain Code Generation
arXiv 2022
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
arXiv 2022
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
ACL 2022 5
Eval contributions
1Affiliations
Frequent co-authors
10from 22 papers
Graham Neubig
professor
Frank F. Xu
researcher
Frank F. Xu
Uri Alon
researcher
Daniel Fried
professor
Danyang Zhang
researcher
Tao Yu
professor
Tianbao Xie
grad-student
Tianyue Ou
Yiheng Xu
researcher