HaoNing Wu
- Papers
- 45
Cite
Notes
Only stored in your browser.
Authored papers
45OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
arXiv 2026
BabyVision: Visual Reasoning Beyond Language
arXiv 2026
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
arXiv 2026
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
arXiv 2026
Towards Pixel-Level VLM Perception via Simple Points Prediction
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
Kimi-VL Technical Report
arXiv 2025
Multi-Agent System for Comprehensive Soccer Understanding
arXiv 2025
Teaching LMMs for Image Quality Scoring and Interpreting
arXiv 2025
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
arXiv 2025
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
arXiv 2025
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
arXiv 2025
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks
arXiv 2025
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
arXiv 2025
MatchTime: Towards Automatic Soccer Game Commentary Generation
arXiv 2024
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
arXiv 2024
Q-Ground: Image Quality Grounding with Large Multi-modality Models
arXiv 2024
Towards Universal Soccer Video Understanding
CVPR 2025 1
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
CVPR 2025 1
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
arXiv 2024
Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs
arXiv 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
arXiv 2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
arXiv 2024
AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results
arXiv 2024
Aria: An Open Multimodal Native Mixture-of-Experts Model
arXiv 2024
VQA$^2$: Visual Question Answering for Video Quality Assessment
arXiv 2024
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare
arXiv 2024
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
ICCV 2025
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
arXiv 2024
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image
arXiv 2024
Dual-Branch Network for Portrait Image Quality Assessment
arXiv 2024
LIME: Less Is More for MLLM Evaluation
arXiv 2024
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
arXiv 2024
TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment
arXiv 2023
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
arXiv 2023
AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment
arXiv 2023
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
arXiv 2023
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
CVPR 2024 1
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach
arXiv 2023
Iterative Token Evaluation and Refinement for Real-World Super-Resolution
arXiv 2023
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
arXiv 2023
Exploring the Naturalness of AI-Generated Images
arXiv 2023
Boost Video Frame Interpolation via Motion Adaptation
arXiv 2023
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
ICCV 2023 1
FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling
arXiv 2022
Affiliations
Frequent co-authors
10from 45 papers