Stanislav Fort
Researcher in deep-learning theory, adversarial robustness, and AI safety; previously Anthropic and Google.
- Role
- researcher
- Currently at
- Anthropic
- twitter.com/stanislavfort
- GitHub
- github.com/stanislavfort
- Scholar
- scholar.google.com/citations
- Papers
- 4
Cite
Notes
Only stored in your browser.
4papers
Authored papers
4Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness
arXiv 2024
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
preprint
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
arXiv 2022
Constitutional AI: Harmlessness from AI Feedback
arXiv 2022
Affiliations
Frequent co-authors
10from 4 papers
Amanda Askell
researcher
Andy Jones
researcher
Anna Chen
researcher
Ben Mann
founder
Catherine Olsson
researcher
Danny Hernandez
researcher
Dario Amodei
CEO
Dawn Drain
researcher
Deep Ganguli
researcher
Jackson Kernion
researcher