What capabilities does TextArena test?

TextArena evaluates multi turn dialog, planning, instruction following.

How can a model improve its TextArena score?

Tools linked to TextArena on Sophon include Openenv Textarena RL Env (Hugging Face), Openenv Textarena RL Env (Prime Intellect), Sudoku RL Env (Community), Hangman RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

What license is TextArena under?

TextArena is available under MIT.

TextArena

Active

70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.

Open

Publisher: Laude Institute
Capabilities: Multi Turn Dialog Planning Instruction Following
Domain: agentic
Format: Openenv
Size: 70 tasks
License: MIT
Published: Feb 2025
Notable for: Benchmark for evaluating multi turn dialog, planning and instruction following in the agentic domain.
Canonical: textarena.ai
Also on: github.com/LeonGuertler/TextArena

Cite

Notes

Only stored in your browser.

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Openenv Textarena RL Env (Hugging Face)

Hugging Face

OpenEnv port of TextArena - a collection of competitive text games (Wordle, Snake, Tic-Tac-Toe, etc.) for evaluating LLM reasoning under interactive rules.

ImplementationRL EnvPlanningInstruction FollowingMulti Turn Dialog

Openenv Textarena RL Env (Prime Intellect)

Prime Intellect

OpenEnv TextArena (Wordle-v0) environment via the OpenEnvEnv adapter

ImplementationRL EnvOpenenvTextarenaWordle

Sudoku RL Env (Community)

Your environment description here

Trains towardRL Env

Hangman RL Env (Community)

Game environment for hangman, built on top of TextArena

Trains towardRL EnvTextarenaReasoningGame

Wordle RL Env (Prime Intellect)

Prime Intellect

Wordle game environment

Trains towardRL EnvTextarenaReasoningGame

HARD Wordle RL Env (Community)

Hard mode Wordle game environment with enforced letter inclusion rules.

Trains towardRL EnvHard Mode

Hurdle Wordle RL Env (Community)

Hurdle Wordle game environment - a Wordle variant that provides only counts of green/yellow letters

Trains towardRL EnvHurdle WordleWordleLLM Testing

Wordle RL Env (Community)

Wordle game environment

Trains towardRL EnvTextarenaReasoningGame

Papers

TextArena: Multi-Agent Text-Based Games for LLM Evaluation

preprint · 2025

Open-source library of 100+ text-based multi-agent games (negotiation, deception, strategy) for evaluating LLMs in head-to-head interactive settings.

introduces

TextArena: Multi-Agent Text-Based Games for LLM Evaluation

preprint · 2025

Open-source library of 100+ text-based multi-agent games (negotiation, deception, strategy) for evaluating LLMs in head-to-head interactive settings.

FAQ

What is TextArena?: 70+ competitive text games (negotiation, social deduction, language puzzles) where models play head-to-head and a TrueSkill rating is fit.
What capabilities does TextArena test?: TextArena evaluates multi turn dialog, planning, instruction following.
How can a model improve its TextArena score?: Tools linked to TextArena on Sophon include Openenv Textarena RL Env (Hugging Face), Openenv Textarena RL Env (Prime Intellect), Sudoku RL Env (Community), Hangman RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is TextArena under?: TextArena is available under MIT.