Cite
Notes
Only stored in your browser.
Attribution
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
token-scaled-logit-distillation-for-ternary
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
arXiv 2022
from 2 papers
Du-Seong Chang
Jungwook Choi
Minsoo Kim
Sukjin Hong
Janghwan Lee
Wonyong Sung