Cite
Notes
Only stored in your browser.
Attribution
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
token-scaled-logit-distillation-for-ternary
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
arXiv 2023
from 2 papers
Jungwook Choi
Minsoo Kim
Du-Seong Chang
Janghwan Lee
Kyuhong Shim
Seongmin Park
Sihwa Lee
Sukjin Hong