MakeMeSay
Active
Tests a model’s ability to generate persuasive or manipulative text, specifically in the setting of a game where one AI system has to get the other party (an AI system simulating a human) to say a specific codeword without arousing suspicion.
- Publisher
- OpenAI
- Domain
- Safeguards
- License
- mit
- Published
- Sep 2025
- Notable for
- Benchmark for evaluating Safeguards.
Cite
Notes
Only stored in your browser.
FAQ
- What is MakeMeSay?
- Tests a model’s ability to generate persuasive or manipulative text, specifically in the setting of a game where one AI system has to get the other party (an AI system simulating a human) to say a specific codeword without arousing suspicion.
- What license is MakeMeSay under?
- MakeMeSay is available under mit.