Jiebo Luo
- Papers
- 47
Cite
Notes
Only stored in your browser.
Authored papers
47Audio-Visual Intelligence in Large Foundation Models
arXiv 2026
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
arXiv 2026
Aurora: Unified Video Editing with a Tool-Using Agent
arXiv 2026
SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models
arXiv 2026
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
arXiv 2026
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
ICCV 2025
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
arXiv 2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
arXiv 2025
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
arXiv 2025
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
arXiv 2025
Latent Chain-of-Thought for Visual Reasoning
arXiv 2025
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese
arXiv 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
arXiv 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
arXiv 2025
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
arXiv 2025
On Path to Multimodal Generalist: General-Level and General-Bench
arXiv 2025
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
arXiv 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
arXiv 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
arXiv 2025
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
arXiv 2025
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
arXiv 2024
Autoregressive Models in Vision: A Survey
arXiv 2024
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025 1
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
arXiv 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
arXiv 2024
PromptFix: You Prompt and We Fix the Photo
arXiv 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
arXiv 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
arXiv 2024
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
arXiv 2024
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
arXiv 2024
GaussianStyle: Gaussian Head Avatar via StyleGAN
arXiv 2024
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
arXiv 2023
Video Understanding with Large Language Models: A Survey
arXiv 2023
Grounding 3D Object Affordance from 2D Interactions in Images
ICCV 2023 1
Spatial-Aware Token for Weakly Supervised Object Localization
ICCV 2023 1
Mixture of Weak & Strong Experts on Graphs
arXiv 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
arXiv 2023
GPT-4V(ision) as A Social Media Analysis Engine
arXiv 2023
Deceptive Fairness Attacks on Graphs via Meta Learning
arXiv 2023
Computational Assessment of Hyperpartisanship in News Titles
arXiv 2023
Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation
arXiv 2023
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
arXiv 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
arXiv 2022
Automatic Relation-aware Graph Network Proliferation
CVPR 2022 1
Holistic Visual-Textual Sentiment Analysis with Prior Models
arXiv 2022
Enhanced Aspect-Based Sentiment Analysis Models with Progressive Self-supervised Attention Learning
arXiv 2021
TGIF: A New Dataset and Benchmark on Animated GIF Description
tgif-a-new-dataset-and-benchmark-on-animated-1
Affiliations
Frequent co-authors
10from 47 papers