Xin Zhou
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
arXiv 2026
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
arXiv 2026
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
arXiv 2026
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
arXiv 2025
Learning Item Representations Directly from Multimodal Features for Effective Recommendation
arXiv 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
arXiv 2025
Step-GUI Technical Report
arXiv 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
arXiv 2025
CM$^3$: Calibrating Multimodal Recommendation
arXiv 2025
MINIMA: Modality Invariant Image Matching
CVPR 2025 1
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning
arXiv 2024
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
arXiv 2024
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search
arXiv 2024
Are Large Language Models Good Prompt Optimizers?
arXiv 2024
SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap
arXiv 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
arXiv 2024
Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability
arXiv 2024
Better Zero-Shot Reasoning with Role-Play Prompting
arXiv 2023
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
arXiv 2023
SoccerNet 2023 Challenges Results
arXiv 2023
SoccerNet 2022 Challenges Results
arXiv 2022
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation
arXiv 2022
Bootstrap Latent Representations for Multi-modal Recommendation
arXiv 2022
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
arXiv 2021
Affiliations
Frequent co-authors
10from 26 papers
Dingkang Liang
Xiang Bai
Dingyuan Zhang
Zhiqi Shen
Alexandre Alahi
Anthony Cioppa
Bernard Ghanem
Christophe De Vleeschouwer
Floriane Magera
Marc Van Droogenbroeck