Tuo Zhao

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

arXiv 2025

Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data

arXiv 2025

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

arXiv 2025

NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

arXiv 2025

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

arXiv 2024

To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

arXiv 2024

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

arXiv 2024

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

arXiv 2024

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

arXiv 2023

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

arXiv 2023

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

arXiv 2023

Deep Reinforcement Learning from Hierarchical Preference Design

arXiv 2023

Machine Learning Force Fields with Data Cost Aware Training

arXiv 2023