Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs
Active
NIAH evaluates in-context retrieval ability of long context LLMs by testing a model's ability to extract factual information from long-context inputs.
- Publisher
- Greg Kamradt
- Domain
- Reasoning
- License
- mit
- Published
- Oct 2024
- Notable for
- Benchmark for evaluating Reasoning.
Cite
Notes
Only stored in your browser.
Related tools
2Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs?
- NIAH evaluates in-context retrieval ability of long context LLMs by testing a model's ability to extract factual information from long-context inputs.
- How can a model improve its Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs score?
- Tools linked to Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs on Sophon include Haystack RLM RL Env (Prime Intellect), Context Needle RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs under?
- Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs is available under mit.