0

Bin Li

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Efficient Autoregressive Video Diffusion with Dummy Head

arXiv 2026

2026

Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis

arXiv 2026

2026

Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding

arXiv 2025

2025

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

arXiv 2025

2025

Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs

arXiv 2025

2025

UMIT: Unifying Medical Imaging Tasks via Vision-Language Models

arXiv 2025

2025

When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

arXiv 2025

2025

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models

arXiv 2025

2025

KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

arXiv 2025

2025

AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference

arXiv 2025

2025

Neural Video Compression with Feature Modulation

CVPR 2024 1

2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

arXiv 2024

2024

GM-DF: Generalized Multi-Scenario Deepfake Detection

arXiv 2024

2024

Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling

arXiv 2024

2024

An Efficient Watermarking Method for Latent Diffusion Models via Low-Rank Adaptation

arXiv 2024

2024

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

arXiv 2023

2023

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

arXiv 2023

2023

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

arXiv 2023

2023

Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

ICCV 2023 1

2023

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

CVPR 2023 1

2022

Large Language Models are Better Reasoners with Self-Verification

arXiv 2022

2022

Learning to Locate Visual Answer in Video Corpus Using Question

arXiv 2022

2022

Deformable DETR: Deformable Transformers for End-to-End Object Detection

deformable-detr-deformable-transformers-for

2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

ICLR 2020 1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers