Xihan Wei

Papers: 12

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

12papers

Authored papers

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

arXiv 2026

2026

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models

CVPR 2025 1

2025

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning

arXiv 2025

2025

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding

arXiv 2025

2025

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

arXiv 2025

2025

ViSpeak: Visual Instruction Feedback in Streaming Videos

ICCV 2025

2025

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

arXiv 2025

2025

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

arXiv 2025

2025

IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation

arXiv 2025

2025

HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context

arXiv 2025

2025

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

arXiv 2025

2025

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

arXiv 2025

2025

Affiliations

No known affiliations.

Frequent co-authors

from 12 papers

Boyuan Sun

Jiaxing Zhao

Qize Yang

Shenghao Fu

Wei-Shi Zheng

Detao Bai

Liefeng Bo

Qibin Hou

Xiang Chen

Xiaohua Xie