0

Zhengzhong Tu

Papers
28

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
28papers

Authored papers

28

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

arXiv 2026

2026

Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling

arXiv 2026

2026

PISCO: Precise Video Instance Insertion with Sparse Control

arXiv 2026

2026

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

arXiv 2026

2026

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

arXiv 2026

2026

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

arXiv 2025

2025

Generative AI for Autonomous Driving: Frontiers and Opportunities

arXiv 2025

2025

DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization

arXiv 2025

2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

2025

LangCoop: Collaborative Driving with Language

arXiv 2025

2025

NTIRE 2025 Challenge on UGC Video Enhancement: Methods and Results

arXiv 2025

2025

LLMs Can Get "Brain Rot"!

arXiv 2025

2025

4KAgent: Agentic Any Image to 4K Super-Resolution

arXiv 2025

2025

UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

ICCV 2025

2025

Complex LLM Planning via Automated Heuristics Discovery

arXiv 2025

2025

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

arXiv 2025

2025

Can Large Vision Language Models Read Maps Like a Human?

arXiv 2025

2025

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

arXiv 2025

2025

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

arXiv 2025

2025

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models

arXiv 2024

2024

AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

arXiv 2024

2024

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

arXiv 2024

2024

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

arXiv 2024

2024

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

CVPR 2025 1

2024

MULLER: Multilayer Laplacian Resizer for Vision

ICCV 2023 1

2023

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

CVPR 2024 1

2023

MAXIM: Multi-Axis MLP for Image Processing

CVPR 2022 1

2022

MaxViT: Multi-Axis Vision Transformer

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 28 papers