0

Peng Jin

Papers
20

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
20papers

Authored papers

20

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

arXiv 2025

2025

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

arXiv 2025

2025

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

arXiv 2025

2025

Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

arXiv 2025

2025

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

arXiv 2024

2024

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

arXiv 2024

2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

arXiv 2024

2024

MoH: Multi-Head Attention as Mixture-of-Head Attention

arXiv 2024

2024

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

arXiv 2024

2024

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

arXiv 2024

2024

Next Patch Prediction for Autoregressive Visual Generation

arXiv 2024

2024

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

arXiv 2024

2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

arXiv 2024

2024

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

video-llava-learning-united-visual

2023

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

ICCV 2023 1

2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

CVPR 2023 1

2023

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

arXiv 2023

2023

FreestyleRet: Retrieving Images from Style-Diversified Queries

arXiv 2023

2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

arXiv 2023

2023

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

CVPR 2024 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 20 papers