0

Needle in a Haystack (NIAH)

Long-context retrieval pressure test - insert a random fact ("needle") at a random depth inside a long document ("haystack") and ask the model to retrieve it verbatim.

Publisher
Greg Kamradt
Format
Custom
Size
configurable: typical sweep is ~15 context lengths × 15 depths × ~10 needles tasks
License
MIT
Published
Nov 2023
Notable for
Benchmark for evaluating factual recall and long context.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is Needle in a Haystack (NIAH)?
Long-context retrieval pressure test - insert a random fact ("needle") at a random depth inside a long document ("haystack") and ask the model to retrieve it verbatim.
What capabilities does Needle in a Haystack (NIAH) test?
Needle in a Haystack (NIAH) evaluates factual recall, long context.
How can a model improve its Needle in a Haystack (NIAH) score?
Tools linked to Needle in a Haystack (NIAH) on Sophon include Haystack RLM RL Env (Prime Intellect), Context Needle RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Needle in a Haystack (NIAH) under?
Needle in a Haystack (NIAH) is available under MIT.