0

Longcot Rlm New

LongCoT long-horizon reasoning evaluation environment using RLM with Python REPL

Domain
rl-env
License
unknown
Published
Apr 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 24.0% by GPT-5.2 - 1 model reporting (1 frontier)

Top models

1
Longcot Rlm NewBar chart with 1 bar. Highest value: GPT-5.2 at 24.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Longcot Rlm New?
LongCoT long-horizon reasoning evaluation environment using RLM with Python REPL
What is the current top score on Longcot Rlm New?
The top reported score is 24.0% by GPT-5.2, across 1 model reporting (1 from frontier labs).
How can a model improve its Longcot Rlm New score?
Tools linked to Longcot Rlm New on Sophon include RLM NEW RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Longcot Rlm New under?
Longcot Rlm New is available under unknown.