Needle in a Haystack (NIAH)
Long-context retrieval pressure test - insert a random fact ("needle") at a random depth inside a long document ("haystack") and ask the model to retrieve it verbatim.
- Publisher
- Greg Kamradt
- Capabilities
- Factual RecallLong Context
- Format
- Custom
- Size
- configurable: typical sweep is ~15 context lengths × 15 depths × ~10 needles tasks
- License
- MIT
- Published
- Nov 2023
- Notable for
- Benchmark for evaluating factual recall and long context.
Cite
Notes
Only stored in your browser.
Related tools
2Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is Needle in a Haystack (NIAH)?
- Long-context retrieval pressure test - insert a random fact ("needle") at a random depth inside a long document ("haystack") and ask the model to retrieve it verbatim.
- What capabilities does Needle in a Haystack (NIAH) test?
- Needle in a Haystack (NIAH) evaluates factual recall, long context.
- How can a model improve its Needle in a Haystack (NIAH) score?
- Tools linked to Needle in a Haystack (NIAH) on Sophon include Haystack RLM RL Env (Prime Intellect), Context Needle RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is Needle in a Haystack (NIAH) under?
- Needle in a Haystack (NIAH) is available under MIT.