Dawn Song
Professor of CS at UC Berkeley; one of the world's most-cited security researchers; works on AI safety, agentic AI, and decentralized intelligence.
- Role
- professor
- Currently at
- University of California, Berkeley
- twitter.com/dawnsongtweets
- GitHub
- github.com/dawn-song
- Scholar
- scholar.google.com/citations
- Papers
- 46
Cite
Notes
Only stored in your browser.
Authored papers
46AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions
arXiv 2026
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents
arXiv 2026
dLLM: Simple Diffusion Language Modeling
arXiv 2026
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
arXiv 2026
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
arXiv 2026
InfoSynth: Information-Guided Benchmark Synthesis for LLMs
arXiv 2026
Adaptation of Agentic AI
arXiv 2025
An Illusion of Progress? Assessing the Current State of Web Agents
arXiv 2025
VERINA: Benchmarking Verifiable Code Generation
arXiv 2025
Learning to Reason without External Rewards
arXiv 2025
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
Progent: Programmable Privilege Control for LLM Agents
arXiv 2025
FrontierCS: Evolving Challenges for Evolving Intelligence
arXiv 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
arXiv 2025
Predicting Task Performance with Context-aware Scaling Laws
arXiv 2025
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
arXiv 2025
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
arXiv 2025
Can LLMs Design Good Questions Based on Context?
arXiv 2025
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
arXiv 2025
SteeringControl: Holistic Evaluation of Alignment Steering in LLMs
arXiv 2025
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
arXiv 2024
Tamper-Resistant Safeguards for Open-Weight LLMs
arXiv 2024
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
arXiv 2024
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
arXiv 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
arXiv 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
arXiv 2024
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
arXiv 2024
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
arXiv 2024
Multimodal Situational Safety
arXiv 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
arXiv 2024
Can Editing LLMs Inject Harm?
arXiv 2024
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
arXiv 2024
Representation Engineering: A Top-Down Approach to AI Transparency
arXiv 2023
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
arXiv 2023
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
CVPR 2023 1
Benchmarking Language Models for Code Syntax Understanding
arXiv 2022
Forecasting Future World Events with Neural Networks
arXiv 2022
Measuring Mathematical Problem Solving With the MATH Dataset
NeurIPS
Measuring Coding Challenge Competence With APPS
arXiv 2021
Measuring Massive Multitask Language Understanding
ICLR
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021 10
Aligning AI With Shared Human Values
arXiv 2020
Extracting Training Data from Large Language Models
arXiv 2020
Natural Adversarial Examples
CVPR 2021 1
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
using-self-supervised-learning-can-improve-1
Affiliations
Frequent co-authors
10from 46 papers
Bo Li
Dan Hendrycks
director
Xuandong Zhao
Jacob Steinhardt
founder
Steven Basart
researcher
Mantas Mazeika
researcher
Tianneng Shi
Andy Zou
founder
Chenguang Wang
Chaowei Xiao