0

Evaluating Large Language Models Trained on Code (pass@k formulation)

The Codex paper that defined pass@k, the unbiased estimator for code-generation success rate over multiple samples, now the universal scoring metric for code evals.

Publisher
OpenAI
Year
2021
Venue
preprint
Authors
4
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

Artifacts

1

Authors

4