0

Language Model Inversion

Language model inversion can recover prompt tokens from next-token probability distributions, achieving BLEU and F1 scores of 59 and 78 respectively on Llama-2 7b.

Year
2023
Venue
arXiv 2023
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2311.13647ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of $59$ and token-level F1 of $78$ and recovers $27%$ of prompts exactly. Code for reproducing all experiments is available at http://github.com/jxmorris12/vec2text.

Authors

5