0

Lisanbench

Single-turn evaluation where the model is tasked to generate the longest valid chain of 1-word edits from a given starting word. The final score is the sum of the longest valid chains across all starting words.

Domain
rl-env
License
unknown
Published
Sep 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 13.22 by Qwen3 4B - 1 model reporting

Top models

1
LisanbenchBar chart with 1 bar. Highest value: Qwen3 4B at 13.2.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Lisanbench?
Single-turn evaluation where the model is tasked to generate the longest valid chain of 1-word edits from a given starting word. The final score is the sum of the longest valid chains across all starting words.
What is the current top score on Lisanbench?
The top reported score is 13.22 by Qwen3 4B, across 1 model reporting.
How can a model improve its Lisanbench score?
Tools linked to Lisanbench on Sophon include Lisanbench RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is Lisanbench under?
Lisanbench is available under unknown.