Florian Tramer
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13Defeating Prompt Injections by Design
arXiv 2025
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
arXiv 2025
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
arXiv 2025
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
arXiv 2024
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
arXiv 2024
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
arXiv 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
arXiv 2024
Evaluating Superhuman Models with Consistency Checks
arXiv 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
arXiv 2023
Evading Black-box Classifiers Without Breaking Eggs
arXiv 2023
Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
arXiv 2022
Large Language Models Can Be Strong Differentially Private Learners
large-language-models-can-be-strong
Extracting Training Data from Large Language Models
arXiv 2020
Affiliations
Frequent co-authors
10from 13 papers