Cite
Notes
Only stored in your browser.
Attribution
BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization
arXiv 2025
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
arXiv 2024
from 2 papers
Catherine Arnett
Max Bartolo