0

Lhaw Rlm

LHAW RLM environment: underspecified prompts, simulated user clarification (ask_user), and LLM judge scoring on the ScaleAI/lhaw dataset.

Domain
rl-env
License
unknown
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 43.8% by GPT-4.1 Mini - 1 model reporting (1 frontier)

Top models

1
Lhaw RlmBar chart with 1 bar. Highest value: GPT-4.1 Mini at 43.8.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Lhaw Rlm?
LHAW RLM environment: underspecified prompts, simulated user clarification (ask_user), and LLM judge scoring on the ScaleAI/lhaw dataset.
What is the current top score on Lhaw Rlm?
The top reported score is 43.8% by GPT-4.1 Mini, across 1 model reporting (1 from frontier labs).
How can a model improve its Lhaw Rlm score?
Tools linked to Lhaw Rlm on Sophon include LHAW RLM RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Lhaw Rlm under?
Lhaw Rlm is available under unknown.