Modern LLMs are increasingly accessed via black-box APIs, requiring users to transmit sensitive prompts, outputs, and fine-tuning data to external providers, creating a critical privacy risk at the API boundary. We introduce AlienLM, a deployable API-only \cradd{exposure-reduction layer that reduces plaintext exposure} by translating text into an Alien Language via a vocabulary-scale bijection, enabling lossless recovery on the client side. Using only standard fine-tuning APIs, Alien Adaptation Training (AAT) adapts target models to operate directly on alienized inputs. Across four LLM backbones and seven benchmarks, AlienLM retains over 81% of plaintext-oracle performance on average, substantially outperforming random-bijection and character-level baselines. Under adversaries with access to model weights, corpus statistics, and learning-based inverse translation, recovery attacks reconstruct fewer than 0.22% of alienized tokens. Our results demonstrate a practical pathway for \cradd{privacy-aware} LLM deployment under API-only access, substantially reducing plaintext exposure while maintaining task performance. Code and data are available at https://github.com/KimJaehee0725/AlienLM.
AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs
Modern LLMs are increasingly accessed via black-box APIs, requiring users to transmit sensitive prompts, outputs, and fine-tuning data to external providers, creating a critical privacy risk at the API boundary.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-SA-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2601.22710CC-BY-SA-4.0
- TL;DR
- Semantic Scholar