long-context
Independent
Classic long-context stress test that hides a short fact ("the needle") inside a long document and asks the model to retrieve it.
Greg Kamradt
Long-context retrieval pressure test - insert a random fact ("needle") at a random depth inside a long document ("haystack") and ask the model to retrieve it verbatim.
Needle-in-haystack - locate a target sentence in a long document.
Prime Intellect
Needle-in-haystack environment using RLM with Python REPL