Shengqiong Wu
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14Audio-Visual Intelligence in Large Foundation Models
arXiv 2026
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
arXiv 2026
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
arXiv 2026
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
arXiv 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
arXiv 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
arXiv 2025
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
arXiv 2025
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
arXiv 2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
arXiv 2025
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
On Path to Multimodal Generalist: General-Level and General-Bench
arXiv 2025
Towards Semantic Equivalence of Tokenization in Multimodal LLM
arXiv 2024
NExT-GPT: Any-to-Any Multimodal LLM
arXiv 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
arXiv 2023
Affiliations
Frequent co-authors
10from 14 papers