0

Needle in a Haystack

Classic long-context stress test that hides a short fact ("the needle") inside a long document and asks the model to retrieve it.

Publisher
Independent
Format
Custom
Size
Unknown tasks
License
MIT
Published
Nov 2023
Notable for
Benchmark for evaluating long context and retrieval.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

Contributors

1

FAQ

What is Needle in a Haystack?
Classic long-context stress test that hides a short fact ("the needle") inside a long document and asks the model to retrieve it.
What capabilities does Needle in a Haystack test?
Needle in a Haystack evaluates long context, retrieval.
How can a model improve its Needle in a Haystack score?
Tools linked to Needle in a Haystack on Sophon include Haystack RLM RL Env (Prime Intellect), Context Needle RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Needle in a Haystack under?
Needle in a Haystack is available under MIT.