0

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving…

Preview
Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2606.17999CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated [EOS] tokens for padding during instruction tuning, giving [EOS] a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of [EOS] overflow under large-block decoding. To decouple these roles, we propose VoidPadding, which introduces [VOID] for padding and reserves [EOS] for termination. During inference, the learned [EOS] signal enables early stopping, while the learned [VOID] signal guides adaptive response canvas expansion. On Dream-7B-Instruct, VoidPadding improves the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks by +17.84 points over the original model and +6.95 points over RainbowPadding, while reducing decoding NFE by 55.7% on average. Code is available at https://github.com/Haru-LCY/VoidPadding.