Zeyu Zhang
- Papers
- 67
Cite
Notes
Only stored in your browser.
Authored papers
67TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
arXiv 2026
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
arXiv 2026
Code2Worlds: Empowering Coding LLMs for 4D World Generation
arXiv 2026
CoV: Chain-of-View Prompting for Spatial Reasoning
arXiv 2026
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
arXiv 2026
AnyDepth: Depth Estimation Made Easy
arXiv 2026
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
arXiv 2026
Cost-Efficient RAG for Entity Matching with LLMs: A Blocking-based Exploration
arXiv 2026
UniMesh: Unifying 3D Mesh Understanding and Generation
arXiv 2026
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
arXiv 2026
MMA: Multimodal Memory Agent
arXiv 2026
V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval
arXiv 2026
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
arXiv 2026
MWM: Mobile World Models for Action-Conditioned Consistent Prediction
arXiv 2026
Light4D: Training-Free Extreme Viewpoint 4D Video Relighting
arXiv 2026
MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation
arXiv 2026
WebCryptoAgent: Agentic Crypto Trading with Web Informatics
arXiv 2026
StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation
arXiv 2026
OCR-Agent: Agentic OCR with Capability and Memory Reflection
arXiv 2026
OmniOCR: Generalist OCR for Ethnic Minority Languages
arXiv 2026
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence
arXiv 2026
HSG: Hyperbolic Scene Graph
arXiv 2026
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
arXiv 2025
Motion Anything: Any to Motion Generation
arXiv 2025
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
arXiv 2025
MediAug: Exploring Visual Augmentation in Medical Imaging
arXiv 2025
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting
arXiv 2025
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
arXiv 2025
3D CoCa: Contrastive Learners are 3D Captioners
arXiv 2025
ReMoMask: Retrieval-Augmented Masked Motion Generation
arXiv 2025
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction
arXiv 2025
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots
arXiv 2025
Composing Concepts from Images and Videos via Concept-prompt Binding
arXiv 2025
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
arXiv 2025
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
arXiv 2025
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
arXiv 2025
EvoVLA: Self-Evolving Vision-Language-Action Model
arXiv 2025
DragMesh: Interactive 3D Generation Made Easy
arXiv 2025
Nav-R1: Reasoning and Navigation in Embodied Scenes
arXiv 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
arXiv 2025
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
arXiv 2025
MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction
arXiv 2025
FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration
arXiv 2025
StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes
arXiv 2025
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
arXiv 2025
EgoLCD: Egocentric Video Generation with Long Context Diffusion
arXiv 2025
VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery
arXiv 2025
IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
arXiv 2025
SSS: Semi-Supervised SAM-2 with Efficient Prompting for Medical Imaging Segmentation
arXiv 2025
GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model for Multi-organ Segmentation
arXiv 2025
PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images
arXiv 2025
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
arXiv 2025
PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
arXiv 2025
ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer
arXiv 2025
DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps
arXiv 2025
JointViT: Modeling Oxygen Saturation Levels with Joint Supervision on Long-Tailed OCTA
arXiv 2024
A Survey on the Memory Mechanism of Large Language Model based Agents
arXiv 2024
Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
arXiv 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
arXiv 2024
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
arXiv 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
arXiv 2024
MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection
arXiv 2024
DiabetesNet: A Deep Learning Approach to Diabetes Diagnosis
arXiv 2024
A Survey on Large Language Model based Autonomous Agents
arXiv 2023
User Behavior Simulation with Large Language Model based Agents
arXiv 2023
SegReg: Segmenting OARs by Registering MR Images and CT Annotations
arXiv 2023
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
ICCV 2023 1
Affiliations
Frequent co-authors
10from 67 papers