Evaluating Large Language Models Trained on Code
OpenAI's foundational Codex paper introducing HumanEval, the pass@k metric, and the Codex model line that powered the original GitHub Copilot.
- Publisher
- OpenAI
- Year
- 2021
- Venue
- preprint
- Authors
- 58
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
TL;DR
Semantic Scholar
It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.
Authors
58Alec RadfordAlethea PowerAlex NicholAlex PainoAlex RayAndrew CarrAriel Herbert-VossBob McGrewBrooke ChanChristopher HesseClemens WinterDario AmodeiDave CummingsElizabeth Barnes (Beth Barnes)Evan MorikawaFelipe Petroski SuchFotios ChantzisGirish SastryGreg BrockmanGretchen KruegerHarri EdwardsHeewoo JunHeidy KhlaafHenrique Ponde de Oliveira PintoIgor BabuschkinIlya SutskeverJan LeikeJared KaplanJerzy "Jerry" TworekJie TangJoshua AchiamKatie MayerŁukasz KaiserMark ChenMatthew KnightMatthias PlappertMichael PetrovMikhail PavlovMiles BrundageMira MuratiMohammad BavarianNicholas JosephNick RyderNikolas TezakPamela MishkinPeter WelinderPhilippe TilletQiming YuanRaul PuriSam McCandlishScott GrayShantanu JainSuchir BalajiVedant MisraWilliam Hebgen GussWilliam SaundersWojciech ZarembaYuri Burda