Siliang Tang

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

arXiv 2025

On Path to Multimodal Generalist: General-Level and General-Bench

arXiv 2025

WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

arXiv 2025

MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models

arXiv 2025

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

ICCV 2025

Graph Retrieval-Augmented Generation: A Survey

arXiv 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

arXiv 2024

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

arXiv 2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

arXiv 2024

GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs

arXiv 2024

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

arXiv 2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

arXiv 2023

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

CVPR 2024 1

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

ICCV 2023 1