Lu Hou
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
arXiv 2025
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
arXiv 2025
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement
arXiv 2025
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
arXiv 2025
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
arXiv 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
arXiv 2025
FlatQuant: Flatness Matters for LLM Quantization
arXiv 2024
TempCompass: Do Video LLMs Really Understand Videos?
arXiv 2024
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
arXiv 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025 1
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
arXiv 2024
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
CVPR 2024 1
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
arXiv 2023
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
arXiv 2023
Affiliations
Frequent co-authors
10from 14 papers