Zhengzhong Tu
- Papers
- 28
Cite
Notes
Only stored in your browser.
Authored papers
28SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
arXiv 2026
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
arXiv 2026
PISCO: Precise Video Instance Insertion with Sparse Control
arXiv 2026
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
arXiv 2026
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
arXiv 2026
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
arXiv 2025
Generative AI for Autonomous Driving: Frontiers and Opportunities
arXiv 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
LangCoop: Collaborative Driving with Language
arXiv 2025
NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results
arXiv 2025
LLMs Can Get "Brain Rot"!
arXiv 2025
4KAgent: Agentic Any Image to 4K Super-Resolution
arXiv 2025
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
ICCV 2025
Complex LLM Planning via Automated Heuristics Discovery
arXiv 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
arXiv 2025
Can Large Vision Language Models Read Maps Like a Human?
arXiv 2025
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
arXiv 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
arXiv 2025
Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models
arXiv 2024
AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results
arXiv 2024
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving
arXiv 2024
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving
arXiv 2024
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
CVPR 2025 1
MULLER: Multilayer Laplacian Resizer for Vision
ICCV 2023 1
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
CVPR 2024 1
MAXIM: Multi-Axis MLP for Image Processing
CVPR 2022 1
MaxViT: Multi-Axis Vision Transformer
arXiv 2022
Affiliations
Frequent co-authors
10from 28 papers