0

Training Noise Token Pruning

TNT Pruning introduces continuous additive noise to vision transformers, enabling smooth training while maintaining computational benefits of token dropping during deployment.

Year
2024
Venue
arXiv 2024
Authors
3
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2411.18092ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In the present work we present Training Noise Token (TNT) Pruning for vision transformers. Our method relaxes the discrete token dropping condition to continuous additive noise, providing smooth optimization in training, while retaining discrete dropping computational gains in deployment settings. We provide theoretical connections to Rate-Distortion literature, and empirical evaluations on the ImageNet dataset using ViT and DeiT architectures demonstrating TNT's advantages over previous pruning methods.

Authors

3