Diyi Yang
- Papers
- 42
Cite
Notes
Only stored in your browser.
Authored papers
42Towards Execution-Grounded Automated AI Research
arXiv 2026
CooperBench: Why Coding Agents Cannot be Your Teammates Yet
arXiv 2026
SWE-smith: Scaling Data for Software Engineering Agents
arXiv 2025
SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs
arXiv 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
arXiv 2025
AutoLibra: Agent Metric Induction from Open-Ended Feedback
arXiv 2025
Real-Time Reasoning Agents in Evolving Environments
arXiv 2025
EgoNormia: Benchmarking Physical Social Norm Understanding
arXiv 2025
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?
arXiv 2025
OpenCUA: Open Foundations for Computer-Use Agents
arXiv 2025
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
arXiv 2025
GEM: A Gym for Agentic LLMs
arXiv 2025
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action
arXiv 2024
Are Large Language Models Consistent over Value-laden Questions?
arXiv 2024
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering
arXiv 2024
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
arXiv 2024
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
arXiv 2024
Unintended Impacts of LLM Alignment on Global Representation
arXiv 2024
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
arXiv 2024
Aligning Language Models with Demonstrated Feedback
arXiv 2024
Attacking Vision-Language Computer Agents via Pop-ups
arXiv 2024
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
arXiv 2023
Training Socially Aligned Language Models on Simulated Social Interactions
arXiv 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
arXiv 2023
NormBank: A Knowledge Bank of Situational Social Norms
arXiv 2023
Can Large Language Models Transform Computational Social Science?
arXiv 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
arXiv 2023
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
arXiv 2023
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise
arXiv 2023
Task-Agnostic Low-Rank Adapters for Unseen English Dialects
arXiv 2023
TADA: Task-Agnostic Dialect Adapters for English
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
arXiv 2022
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
arXiv 2022
TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding
NAACL 2022 7
VALUE: Understanding Dialect Disparity in NLU
ACL 2022 5
DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue
arXiv 2022
Inducing Positive Perspectives with Text Reframing
ACL 2022 5
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering
Findings (ACL) 2021 8
A Search Engine for Discovery of Scientific Challenges and Directions
NeurIPS Workshop AI4Scien 2021 12
Evaluating Graph Vulnerability and Robustness using TIGER
arXiv 2020
Automatically Neutralizing Subjective Bias in Text
arXiv 2019
Affiliations
Frequent co-authors
10from 42 papers