Zhenglun Kong
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9Democratizing AI scientists using ToolUniverse
arXiv 2025
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
arXiv 2025
Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation
arXiv 2025
Fully Open Source Moxin-7B Technical Report
arXiv 2024
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
arXiv 2024
Search for Efficient Large Language Models
arXiv 2024
EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge
arXiv 2024
Rethinking Token Reduction for State Space Models
arXiv 2024
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
arXiv 2023
Affiliations
Frequent co-authors
10from 9 papers