Percy Liang
Associate professor at Stanford CS; founding director of CRFM; lead author of HELM and one of the field's most influential voices on foundation-model evaluation and openness.
- Role
- professor
- twitter.com/percyliang
- GitHub
- github.com/percyliang
- Scholar
- scholar.google.com/citations
- Papers
- 62
Cite
Notes
Only stored in your browser.
Authored papers
62DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
arXiv 2026
s1: Simple Test-Time Scaling
preprint
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
arXiv 2025
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
arXiv 2025
AHELM: A Holistic Evaluation of Audio-Language Models
arXiv 2025
UQ: Assessing Language Models on Unsolved Questions
arXiv 2025
Auditing Prompt Caching in Language Model APIs
arXiv 2025
Reliable and Efficient Amortized Model-based Evaluation
arXiv 2025
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
COLM
Introducing v0.5 of the AI Safety Benchmark from MLCommons
arXiv 2024
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
arXiv 2024
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
arXiv 2024
VideoAgent: Self-Improving Video Generation
arXiv 2024
Instruction Following without Instruction Tuning
arXiv 2024
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models
arXiv 2024
RedPajama: an Open Dataset for Training Large Language Models
arXiv 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
arXiv 2024
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
arXiv 2024
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
arXiv 2024
LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain
arXiv 2024
Evaluating Self-Supervised Learning via Risk Decomposition
arXiv 2023
Model Editing with Canonical Examples
arXiv 2024
Generative Agents: Interactive Simulacra of Human Behavior
arXiv 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
arXiv 2023
Language-Driven Representation Learning for Robotics
arXiv 2023
Whose Opinions Do Language Models Reflect?
arXiv 2023
The Foundation Model Transparency Index
arXiv 2023
Robust Distortion-free Watermarks for Language Models
arXiv 2023
Benchmarking Large Language Models for News Summarization
arXiv 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
arXiv 2023
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy
arXiv 2023
Evaluating Verifiability in Generative Search Engines
arXiv 2023
Lost in the Middle: How Language Models Use Long Contexts
arXiv 2023
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
arXiv 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
doremi-optimizing-data-mixtures-speeds-up
Data Selection for Language Models via Importance Resampling
data-selection-for-language-models-via
On the Learnability of Watermarks for Language Models
arXiv 2023
Out-of-Domain Robustness via Targeted Augmentations
arXiv 2023
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness
arXiv 2023
Holistic Evaluation of Language Models
TMLR
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Deep Bidirectional Language-Knowledge Graph Pretraining
arXiv 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
arXiv 2022
GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
arXiv 2022
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
arXiv 2022
LinkBERT: Pretraining Language Models with Document Links
ACL 2022 5
Diffusion-LM Improves Controllable Text Generation
arXiv 2022
Truncation Sampling as Language Model Desmoothing
arXiv 2022
Codified audio language modeling learns useful representations for music information retrieval
arXiv 2021
Large Language Models Can Be Strong Differentially Private Learners
large-language-models-can-be-strong
LM-Critic: Language Models for Unsupervised Grammatical Error Correction
EMNLP 2021 11
An Explanation of In-context Learning as Implicit Bayesian Inference
an-explanation-of-in-context-learning-as
Prefix-Tuning: Optimizing Continuous Prompts for Generation
ACL 2021 5
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
NAACL 2021 4
LILA: Language-Informed Latent Actions
arXiv 2021
WILDS: A Benchmark of in-the-Wild Distribution Shifts
arXiv 2020
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
arXiv 2019
Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer
delete-retrieve-generate-a-simple-approach-to-1
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
reinforcement-learning-on-web-interfaces-1
Mapping Natural Language Commands to Web Elements
mapping-natural-language-commands-to-web-1
Learning Language Games through Interaction
learning-language-games-through-interaction-1
Compositional Semantic Parsing on Semi-Structured Tables
compositional-semantic-parsing-on-semi-1
Affiliations
Frequent co-authors
10from 62 papers
Tatsunori Hashimoto
professor
Michihiro Yasunaga
Christopher D. Manning
Jure Leskovec
Xiang Lisa Li
researcher
Tony Lee
researcher
Dorsa Sadigh
John Hewitt
Sang Michael Xie
Siddharth Karamcheti