0

Xin Zhou

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

arXiv 2026

2026

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

arXiv 2026

2026

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

arXiv 2026

2026

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

ICCV 2025

2025

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

arXiv 2025

2025

Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving

arXiv 2025

2025

Learning Item Representations Directly from Multimodal Features for Effective Recommendation

arXiv 2025

2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

arXiv 2025

2025

Step-GUI Technical Report

arXiv 2025

2025

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

arXiv 2025

2025

CM$^3$: Calibrating Multimodal Recommendation

arXiv 2025

2025

MINIMA: Modality Invariant Image Matching

CVPR 2025 1

2024

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

arXiv 2024

2024

LongHeads: Multi-Head Attention is Secretly a Long Context Processor

arXiv 2024

2024

Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search

arXiv 2024

2024

Are Large Language Models Good Prompt Optimizers?

arXiv 2024

2024

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

arXiv 2024

2024

CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences

arXiv 2024

2024

Disentangled Graph Variational Auto-Encoder for Multimodal Recommendation with Interpretability

arXiv 2024

2024

Better Zero-Shot Reasoning with Role-Play Prompting

arXiv 2023

2023

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

arXiv 2023

2023

SoccerNet 2023 Challenges Results

arXiv 2023

2023

SoccerNet 2022 Challenges Results

arXiv 2022

2022

A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation

arXiv 2022

2022

Bootstrap Latent Representations for Multi-modal Recommendation

arXiv 2022

2022

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers