0

Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs

Active

NIAH evaluates in-context retrieval ability of long context LLMs by testing a model's ability to extract factual information from long-context inputs.

Publisher
Greg Kamradt
Domain
Reasoning
License
mit
Published
Oct 2024
Notable for
Benchmark for evaluating Reasoning.

Cite

Notes

Only stored in your browser.

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs?
NIAH evaluates in-context retrieval ability of long context LLMs by testing a model's ability to extract factual information from long-context inputs.
How can a model improve its Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs score?
Tools linked to Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs on Sophon include Haystack RLM RL Env (Prime Intellect), Context Needle RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs under?
Needle in a Haystack (NIAH): In-Context Retrieval Benchmark for Long Context LLMs is available under mit.