0

Forbidden Facts: An Investigation of Competing Objectives in Llama-2

LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task.

Year
2023
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2312.08793v3
TL;DR
Semantic Scholar
Attribution policy →