Xiaojuan Qi
- Papers
- 41
Cite
Notes
Only stored in your browser.
Authored papers
41ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
arXiv 2026
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
arXiv 2026
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
arXiv 2026
Stable Velocity: A Variance Perspective on Flow Matching
arXiv 2026
Stereo World Model: Camera-Guided Stereo Video Generation
arXiv 2026
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
arXiv 2026
Scaling RL to Long Videos
arXiv 2025
UniTok: A Unified Tokenizer for Visual Generation and Understanding
arXiv 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
arXiv 2025
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
arXiv 2025
QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution
arXiv 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
arXiv 2025
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation
arXiv 2025
Hita: Holistic Tokenizer for Autoregressive Image Generation
ICCV 2025
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
arXiv 2025
"Principal Components" Enable A New Language of Images
ICCV 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
CVPR 2025 1
UniScene: Unified Occupancy-centric Driving Scene Generation
CVPR 2025 1
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
CVPR 2025 1
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
arXiv 2024
V-IRL: Grounding Virtual Intelligence in Real Life
arXiv 2024
EscherNet: A Generative Model for Scalable View Synthesis
CVPR 2024 1
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
arXiv 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
arXiv 2024
TEXGen: a Generative Diffusion Model for Mesh Textures
arXiv 2024
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
arXiv 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
arXiv 2024
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
arXiv 2024
Can OOD Object Detectors Learn from Foundation Models?
arXiv 2024
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
voxelnext-fully-sparse-voxelnet-for-3d-object
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
CVPR 2024 1
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video
ICCV 2023 1
MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
mars3d-a-plug-and-play-motion-aware-model-for
Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation
ICCV 2023 1
Self-Supervised Visual Representation Learning with Semantic Grouping
arXiv 2022
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing
arXiv 2022
Is synthetic data from generative models ready for image recognition?
arXiv 2022
Parametric Classification for Generalized Category Discovery: A Baseline Study
ICCV 2023 1
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
CVPR 2023 1
Image Inpainting via Generative Multi-column Convolutional Neural Networks
image-inpainting-via-generative-multi-column-1
Affiliations
Frequent co-authors
10from 41 papers