About

Sophon answers one question that no single GitHub README, paper PDF, or leaderboard answers on its own: my model has a gap on eval X - what did other teams use to close it, and where does that show up on a leaderboard?

Eval results, the datasets and RL environments that improve them, and the rankings they feed are normally scattered across dozens of places. Sophon pulls them into one readable surface and connects them by hand - evals ↔ tools ↔ models ↔ leaderboards ↔ capabilities, plus the papers and labs behind each one.

How it's organized

Everything hangs off the model being measured. The pieces around it are easy to confuse, so Sophon keeps them strict:

Evals are the tests - run a model, get one score (SWE-bench, GPQA, MMLU-Pro, IFEval).
Tools are what you train or build with to raise that score: RL environments, fine-tuning datasets, and scaffolds.
Leaderboards are published rankings of many models - single-benchmark boards, aggregated indices, or human-preference Elo.
Capabilitiesare the skills evals test - coding, math, agents, reasoning. They're the thread the Recommender follows from a weakness to the tools that fix it.

What you can do

Recommender- pick a capability and get RL environments and datasets ranked by how many of its evals they're known to lift.
Compare - put models head-to-head across their eval scores and leaderboard standings.
Feed - everything new (evals, tools, models, papers) in real ship-date order.
Browse the full evals, tools, models, and leaderboards catalogs, press ⌘K to search, and star anything to save it to a collection.

Curated, not crowdsourced

Sophon is editorial, not a wiki. Entries are imported from canonical open sources - arXiv, Semantic Scholar, Hugging Face, Papers With Code, LMArena, Artificial Analysis, and the RL-environment registries at Prime Intellect, Nous Research, Meta, and OpenReward - then linked into a controlled graph rather than free-form tags.

Every artefact keeps its provenance: look for the source badge under any README, abstract, or model card. We host full content only where its license clearly allows it, and link out otherwise - see the content policy and attribution.

What's next

Live today: the full catalog, the Recommender, model comparison, the activity feed, search, collections, and a public API and CLI. On the roadmap:

Personal accounts - private notes, saved dashboards, and following the authors, labs, and sources you care about.
Hosted eval runs with your own API keys, and embeddable score badges.

API & CLI - query the catalog from code, agents, or the terminal.
Changelog - notable new features, fixes, and improvements, newest first.
How trending works- the velocity and recency formula behind the feed's Trending tab.
Content policy - what we host, excerpt, or link out to, and how licenses decide.
Sources & sync status - every ingestion adapter, with last-sync timestamps.
Attribution - credit to the upstream data providers.
Takedowns & DMCA - how to request removal or re-tiering.

Start exploring from the home page.

Sophon is built and maintained by 21st Labs.

Sophon

In the catalog

How it's organized

What you can do

Curated, not crowdsourced

What's next

More