Haiwen Diao
- Papers
- 15
Cite
Notes
Only stored in your browser.
Authored papers
15SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
arXiv 2026
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
arXiv 2026
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?
arXiv 2026
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
arXiv 2025
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
ICCV 2025
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
arXiv 2025
Visual Jigsaw Post-Training Improves MLLMs
arXiv 2025
Autoregressive Video Generation without Vector Quantization
arXiv 2024
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
arXiv 2024
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
arXiv 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
arXiv 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
arXiv 2024
Plug-and-Play Regulators for Image-Text Matching
arXiv 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
CVPR 2024 1
Similarity Reasoning and Filtration for Image-Text Matching
arXiv 2021
Affiliations
Frequent co-authors
10from 15 papers