Cite
Notes
Only stored in your browser.
Attribution
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
arXiv 2024
Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks
arXiv 2021
from 2 papers
April Yang
Colin Unger
Gabriele Oliaro
Kaisheng Ma
Linfeng Zhang
Remi Delacourt
Runpei Dong
Ruohan Gao
Vineeth Kada
Xinhao Cheng