A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.
Rethinking Reflection in Pre-Training
A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 28
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2504.04022ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
28Ashish VaswaniKhoi NguyenEssential AIAndrew HojelMichael PustTim RomanskiYash VanjaniRitvik KapilaMohit ParmarAdarsh ChaluvarajuAnil ThomasAshish TanwerDarsh J ShahIshaan ShahKarl StratosKurt SmithMichael CallahanPeter RushtonPhilip MonkPlaton MazarakisSaurabh SrivastavaSomanshu SinglaAndrew MaAnthony PollorenoBurhan Drak SibaiDivya S MansingkaDivya ShivaprasadMrinal Iyer