Evaluating Large Language Models Trained on Code (pass@k formulation)
The Codex paper that defined pass@k, the unbiased estimator for code-generation success rate over multiple samples, now the universal scoring metric for code evals.
- Publisher
- OpenAI
- Year
- 2021
- Venue
- preprint
- Authors
- 4
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.
Artifacts
1Evals