Forbidden Facts: An Investigation of Competing Objectives in Llama-2
LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task.
- Year
- 2023
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.