0

MulSeT

Multi-Category eval and train data for images

Domain
rl-env
License
unknown
Published
Oct 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 70.0% by Gemini 2.5 Pro - 7 models reporting (1 frontier)

Score history

2
0%25%50%75%100%Jun 25Jul 25Aug 25Sep 25Gemini 2.5 Pro

Top models

7
MulSeTBar chart with 7 bars. Highest value: Gemini 2.5 Pro at 70.
7 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is MulSeT?
Multi-Category eval and train data for images
What is the current top score on MulSeT?
The top reported score is 70.0% by Gemini 2.5 Pro, across 7 models reporting (1 from frontier labs).
How can a model improve its MulSeT score?
Tools linked to MulSeT on Sophon include Mulset RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is MulSeT under?
MulSeT is available under unknown.