Stanislav Fort

Researcher in deep-learning theory, adversarial robustness, and AI safety; previously Anthropic and Google.

Role: researcher
Currently at: Anthropic
Twitter: twitter.com/stanislavfort
GitHub: github.com/stanislavfort
Scholar: scholar.google.com/citations
Papers: 4

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

4papers

Authored papers

Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

arXiv 2024

2024

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

preprint

2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

arXiv 2022

2022

Constitutional AI: Harmlessness from AI Feedback

arXiv 2022

2022

Affiliations

Currently at

Anthropic

researcher · frontier lab

Previously

Google (Alphabet Inc.)frontier lab Stanford Center for Research on Foundation Models (CRFM)university lab

Frequent co-authors

from 4 papers

Amanda Askell

researcher

3 shared papers

Andy Jones

researcher

3 shared papers

Anna Chen

researcher

3 shared papers

Ben Mann

founder

3 shared papers

Catherine Olsson

researcher

3 shared papers

Danny Hernandez

researcher

3 shared papers

Dario Amodei

CEO

3 shared papers

Dawn Drain

researcher

3 shared papers

Deep Ganguli

researcher

3 shared papers

Jackson Kernion

researcher

3 shared papers