0

MakeMeSay

Active

Tests a model’s ability to generate persuasive or manipulative text, specifically in the setting of a game where one AI system has to get the other party (an AI system simulating a human) to say a specific codeword without arousing suspicion.

Publisher
OpenAI
Domain
Safeguards
License
mit
Published
Sep 2025
Notable for
Benchmark for evaluating Safeguards.

Cite

Notes

Only stored in your browser.

FAQ

What is MakeMeSay?
Tests a model’s ability to generate persuasive or manipulative text, specifically in the setting of a game where one AI system has to get the other party (an AI system simulating a human) to say a specific codeword without arousing suspicion.
What license is MakeMeSay under?
MakeMeSay is available under mit.