0

Percy Liang

Associate professor at Stanford CS; founding director of CRFM; lead author of HELM and one of the field's most influential voices on foundation-model evaluation and openness.

Role
professor
Papers
62

Cite

Notes

Only stored in your browser.

62papers

Authored papers

62

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

arXiv 2026

2026

s1: Simple Test-Time Scaling

preprint

2025

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

arXiv 2025

2025

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

arXiv 2025

2025

AHELM: A Holistic Evaluation of Audio-Language Models

arXiv 2025

2025

UQ: Assessing Language Models on Unsolved Questions

arXiv 2025

2025

Auditing Prompt Caching in Language Model APIs

arXiv 2025

2025

Reliable and Efficient Amortized Model-based Evaluation

arXiv 2025

2025

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

COLM

2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons

arXiv 2024

2024

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

arXiv 2024

2024

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

arXiv 2024

2024

VideoAgent: Self-Improving Video Generation

arXiv 2024

2024

Instruction Following without Instruction Tuning

arXiv 2024

2024

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

arXiv 2024

2024

RedPajama: an Open Dataset for Training Large Language Models

arXiv 2024

2024

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

arXiv 2024

2024

BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

arXiv 2024

2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

arXiv 2024

2024

LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain

arXiv 2024

2024

Evaluating Self-Supervised Learning via Risk Decomposition

arXiv 2023

2024

Model Editing with Canonical Examples

arXiv 2024

2024

Generative Agents: Interactive Simulacra of Human Behavior

arXiv 2023

2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

arXiv 2023

2023

Language-Driven Representation Learning for Robotics

arXiv 2023

2023

Whose Opinions Do Language Models Reflect?

arXiv 2023

2023

The Foundation Model Transparency Index

arXiv 2023

2023

Robust Distortion-free Watermarks for Language Models

arXiv 2023

2023

Benchmarking Large Language Models for News Summarization

arXiv 2023

2023

Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models

arXiv 2023

2023

"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy

arXiv 2023

2023

Evaluating Verifiability in Generative Search Engines

arXiv 2023

2023

Lost in the Middle: How Language Models Use Long Contexts

arXiv 2023

2023

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

arXiv 2023

2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

doremi-optimizing-data-mixtures-speeds-up

2023

Data Selection for Language Models via Importance Resampling

data-selection-for-language-models-via

2023

On the Learnability of Watermarks for Language Models

arXiv 2023

2023

Out-of-Domain Robustness via Targeted Augmentations

arXiv 2023

2023

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

arXiv 2023

2023

Holistic Evaluation of Language Models

TMLR

2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Deep Bidirectional Language-Knowledge Graph Pretraining

arXiv 2022

2022

Contrastive Decoding: Open-ended Text Generation as Optimization

arXiv 2022

2022

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

arXiv 2022

2022

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

arXiv 2022

2022

LinkBERT: Pretraining Language Models with Document Links

ACL 2022 5

2022

Diffusion-LM Improves Controllable Text Generation

arXiv 2022

2022

Truncation Sampling as Language Model Desmoothing

arXiv 2022

2022

Codified audio language modeling learns useful representations for music information retrieval

arXiv 2021

2021

Large Language Models Can Be Strong Differentially Private Learners

large-language-models-can-be-strong

2021

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

EMNLP 2021 11

2021

An Explanation of In-context Learning as Implicit Bayesian Inference

an-explanation-of-in-context-learning-as

2021

Prefix-Tuning: Optimizing Continuous Prompts for Generation

ACL 2021 5

2021

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

NAACL 2021 4

2021

LILA: Language-Informed Latent Actions

arXiv 2021

2021

WILDS: A Benchmark of in-the-Wild Distribution Shifts

arXiv 2020

2020

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

arXiv 2019

2019

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer

delete-retrieve-generate-a-simple-approach-to-1

2018

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

reinforcement-learning-on-web-interfaces-1

2018

Mapping Natural Language Commands to Web Elements

mapping-natural-language-commands-to-web-1

2018

Learning Language Games through Interaction

learning-language-games-through-interaction-1

2016

Compositional Semantic Parsing on Semi-Structured Tables

compositional-semantic-parsing-on-semi-1

2015

Affiliations

Currently at

Stanford Center for Research on Foundation Models (CRFM)

professor · university lab

Frequent co-authors

10

from 62 papers