Cite
Notes
Only stored in your browser.
Attribution
Tradeoffs Between Alignment and Helpfulness in Language Models with Representation Engineering
arXiv 2024
from 1 papers
Amnon Shashua
Dorin Shteyman
Noam Wies
Yoav Levine
Yotam Wolf