Evaluating Large Language Models Trained on Code
The Codex paper that introduces HumanEval (164 hand-written Python problems) and the pass@k metric, and presents the model behind GitHub Copilot.
- Publisher
- OpenAI
- Year
- 2021
- Venue
- preprint
- Authors
- 6
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 model
TL;DR
Semantic Scholar
It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.
Artifacts
1Models