Kyle Fish

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

arXiv 2026

Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas

arXiv 2025

No known affiliations.

from 2 papers

Christina Lu

Evan Hubinger

Jack Gallagher

Jack Lindsey

Jonathan Michala

Sharan Maiya

Sydney Levine

Yejin Choi

professor

Yu Ying Chiu

Zhilin Wang

researcher