Bin Li
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24Efficient Autoregressive Video Diffusion with Dummy Head
arXiv 2026
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
arXiv 2026
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
arXiv 2025
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
arXiv 2025
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs
arXiv 2025
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
arXiv 2025
When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways
arXiv 2025
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
arXiv 2025
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
arXiv 2025
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference
arXiv 2025
Neural Video Compression with Feature Modulation
CVPR 2024 1
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
arXiv 2024
GM-DF: Generalized Multi-Scenario Deepfake Detection
arXiv 2024
Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling
arXiv 2024
An Efficient Watermarking Method for Latent Diffusion Models via Low-Rank Adaptation
arXiv 2024
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
arXiv 2023
Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks
arXiv 2023
Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios
arXiv 2023
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
ICCV 2023 1
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 1
Large Language Models are Better Reasoners with Self-Verification
arXiv 2022
Learning to Locate Visual Answer in Video Corpus Using Question
arXiv 2022
Deformable DETR: Deformable Transformers for End-to-End Object Detection
deformable-detr-deformable-transformers-for
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020 1
Affiliations
Frequent co-authors
10from 24 papers