Cite
Notes
Only stored in your browser.
Attribution
Designing a Dashboard for Transparency and Control of Conversational AI
arXiv 2024
Defending Against Unforeseen Failure Modes with Latent Adversarial Training
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
NeurIPS 2023 11
from 3 papers
Fernanda Viégas
Kenneth Li
Martin Wattenberg
Aoyu Wu
Catherine Yeh
Dylan Hadfield-Menell
Hanspeter Pfister
Jan Riecke
Lennart Schulze
Nicholas Castillo Marin