Jae-Joon Kim
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents
arXiv 2026
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
arXiv 2026
RelayGen: Intra-Generation Model Switching for Efficient Reasoning
arXiv 2026
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
arXiv 2026
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
arXiv 2025
LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning
arXiv 2025
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
arXiv 2025
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
arXiv 2025
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
arXiv 2024
Affiliations
Frequent co-authors
10from 9 papers