Forbidden Facts: An Investigation of Competing Objectives in Llama-2

LLMs often face competing pressures (for example helpfulness vs. harmlessness). To understand how models resolve such conflicts, we study Llama-2-chat models on the forbidden fact task.

Open

Year: 2023
ArXiv: arxiv.org/abs/2312.08793
URL: arxiv.org/abs/2312.08793v3
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2312.08793v3
TL;DR: Semantic Scholar

Attribution policy →