TextArena
Active
70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.
- Publisher
- Laude Institute
- Capabilities
- Multi Turn DialogPlanningInstruction Following
- Domain
- agentic
- Format
- Openenv
- Size
- 70 tasks
- License
- MIT
- Published
- Feb 2025
- Notable for
- Benchmark for evaluating multi turn dialog, planning and instruction following in the agentic domain.
- Canonical
- textarena.ai
Cite
Notes
Only stored in your browser.
Related tools
8Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is TextArena?
- 70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.
- What capabilities does TextArena test?
- TextArena evaluates multi turn dialog, planning, instruction following.
- How can a model improve its TextArena score?
- Tools linked to TextArena on Sophon include Openenv Textarena RL Env (Hugging Face), Openenv Textarena RL Env (Prime Intellect), Sudoku RL Env (Community), Hangman RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is TextArena under?
- TextArena is available under MIT.