Jiayi Ji
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation
arXiv 2026
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
arXiv 2025
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
arXiv 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
arXiv 2025
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
arXiv 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
arXiv 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
arXiv 2025
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
arXiv 2025
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
arXiv 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
arXiv 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
arXiv 2024
Multi-branch Collaborative Learning Network for 3D Visual Grounding
arXiv 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
arXiv 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
arXiv 2024
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
arXiv 2023
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
ICCV 2023 1
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation
arXiv 2023
Affiliations
Frequent co-authors
10from 17 papers