Diyi Yang

SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs

arXiv 2025

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

arXiv 2025

AutoLibra: Agent Metric Induction from Open-Ended Feedback

arXiv 2025

Real-Time Reasoning Agents in Evolving Environments

arXiv 2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas

arXiv 2025

ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?

arXiv 2025

EgoNormia: Benchmarking Physical Social Norm Understanding

arXiv 2025

GEM: A Gym for Agentic LLMs

arXiv 2025

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

arXiv 2024

Are Large Language Models Consistent over Value-laden Questions?

arXiv 2024

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

arXiv 2024

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

arXiv 2024

Aligning Language Models with Demonstrated Feedback

arXiv 2024

Unintended Impacts of LLM Alignment on Global Representation

arXiv 2024

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors

arXiv 2024

Attacking Vision-Language Computer Agents via Pop-ups

arXiv 2024

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

arXiv 2024

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

arXiv 2023

Training Socially Aligned Language Models on Simulated Social Interactions

arXiv 2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

arXiv 2023

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

arXiv 2023

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

arXiv 2023

NormBank: A Knowledge Bank of Situational Social Norms

arXiv 2023

Can Large Language Models Transform Computational Social Science?

arXiv 2023

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

arXiv 2023

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

arXiv 2023

TADA: Task-Agnostic Dialect Adapters for English

arXiv 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

Inducing Positive Perspectives with Text Reframing

ACL 2022 5

VALUE: Understanding Dialect Disparity in NLU

ACL 2022 5

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

arXiv 2022

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

arXiv 2022

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

arXiv 2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

NAACL 2022 7