0

Diyi Yang

Papers
42

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
42papers

Authored papers

42

Towards Execution-Grounded Automated AI Research

arXiv 2026

2026

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

arXiv 2026

2026

SWE-smith: Scaling Data for Software Engineering Agents

arXiv 2025

2025

SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs

arXiv 2025

2025

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

arXiv 2025

2025

AutoLibra: Agent Metric Induction from Open-Ended Feedback

arXiv 2025

2025

Real-Time Reasoning Agents in Evolving Environments

arXiv 2025

2025

EgoNormia: Benchmarking Physical Social Norm Understanding

arXiv 2025

2025

ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?

arXiv 2025

2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

2025

The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas

arXiv 2025

2025

GEM: A Gym for Agentic LLMs

arXiv 2025

2025

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

arXiv 2024

2024

Are Large Language Models Consistent over Value-laden Questions?

arXiv 2024

2024

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

arXiv 2024

2024

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

arXiv 2024

2024

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

arXiv 2024

2024

Unintended Impacts of LLM Alignment on Global Representation

arXiv 2024

2024

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors

arXiv 2024

2024

Aligning Language Models with Demonstrated Feedback

arXiv 2024

2024

Attacking Vision-Language Computer Agents via Pop-ups

arXiv 2024

2024

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

arXiv 2023

2023

Training Socially Aligned Language Models on Simulated Social Interactions

arXiv 2023

2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

arXiv 2023

2023

NormBank: A Knowledge Bank of Situational Social Norms

arXiv 2023

2023

Can Large Language Models Transform Computational Social Science?

arXiv 2023

2023

CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

arXiv 2023

2023

DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules

arXiv 2023

2023

A Cheaper and Better Diffusion Language Model with Soft-Masked Noise

arXiv 2023

2023

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

arXiv 2023

2023

TADA: Task-Agnostic Dialect Adapters for English

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

arXiv 2022

2022

On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning

arXiv 2022

2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

NAACL 2022 7

2022

VALUE: Understanding Dialect Disparity in NLU

ACL 2022 5

2022

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

arXiv 2022

2022

Inducing Positive Perspectives with Text Reframing

ACL 2022 5

2022

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Findings (ACL) 2021 8

2021

A Search Engine for Discovery of Scientific Challenges and Directions

NeurIPS Workshop AI4Scien 2021 12

2021

Evaluating Graph Vulnerability and Robustness using TIGER

arXiv 2020

2020

Automatically Neutralizing Subjective Bias in Text

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 42 papers