Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Google paper isolating the 23 hardest BIG-Bench tasks (BBH) where prior models lagged humans, showing chain-of-thought prompting closes most of the gap.

Open

Preview
Publisher: Google Research
Year: 2022
Venue: ACL
ArXiv: arxiv.org/abs/2210.09261
Code: github.com/suzgunmirac/BIG-Bench-Hard
Authors: 13
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2210.09261
TL;DR: semanticscholar.org/paper/663a41c866d49ce052801fbc88947d39764cad29
Code: github.com/suzgunmirac/BIG-Bench-Hard

Attribution policy →

TL;DR

Semantic Scholar

This work finds that applying chain-of-thought (CoT) prompting to BBH tasks enables PaLM to surpass the average human-rater performance on 10 of the 23 tasks, and Codex to surpass it on 17 of the23 tasks.

Authors

Aakanksha Chowdhery Denny Zhou Ed H. Chi Hyung Won Chung Jason Wei Mirac Suzgun Nathan Scales Nathanael Schärli Quoc Le Sebastian Gehrmann Yi Tay Quoc V. Le Ed H. Chi