Mengdi Wu

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

arXiv 2024

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

arXiv 2021

No known affiliations.

from 2 papers

April Yang

Colin Unger

Gabriele Oliaro

Kaisheng Ma

Linfeng Zhang

Remi Delacourt

Runpei Dong

Ruohan Gao

Vineeth Kada

Xinhao Cheng