0

Evaluating Large Language Models Trained on Code

The Codex paper that introduces HumanEval (164 hand-written Python problems) and the pass@k metric, and presents the model behind GitHub Copilot.

Publisher
OpenAI
Year
2021
Venue
preprint
Authors
6
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 model

TL;DR

Semantic Scholar

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

Artifacts

1

Authors

6