0

TextArena

Active

70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.

Domain
agentic
Format
Openenv
Size
70 tasks
License
MIT
Published
Feb 2025
Notable for
Benchmark for evaluating multi turn dialog, planning and instruction following in the agentic domain.
Canonical
textarena.ai

Cite

Notes

Only stored in your browser.

Related tools

8
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is TextArena?
70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.
What capabilities does TextArena test?
TextArena evaluates multi turn dialog, planning, instruction following.
How can a model improve its TextArena score?
Tools linked to TextArena on Sophon include Openenv Textarena RL Env (Hugging Face), Openenv Textarena RL Env (Prime Intellect), Sudoku RL Env (Community), Hangman RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is TextArena under?
TextArena is available under MIT.