Thomas Hartvigsen
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13Sparse Autoencoder Features for Classifications and Transferability
arXiv 2025
ModelCitizens:Representing Community Voices in Online Safety
arXiv 2025
Lifelong Knowledge Editing requires Better Regularization
arXiv 2025
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
arXiv 2024
Composable Interventions for Language Models
arXiv 2024
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
arXiv 2024
MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations
arXiv 2024
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
arXiv 2024
Improving Black-box Robustness with In-Context Rewriting
arXiv 2024
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
arXiv 2024
Interpretable Unified Language Checking
arXiv 2023
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
NeurIPS 2023 11
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
ACL 2022 5
Affiliations
Frequent co-authors
10from 13 papers