Jiahao Wang
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
arXiv 2026
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
arXiv 2025
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
arXiv 2025
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
arXiv 2025
OmniGen2: Exploration to Advanced Multimodal Generation
arXiv 2025
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
arXiv 2025
Fitness aligned structural modeling enables scalable virtual screening with AuroBind
arXiv 2025
World-in-World: World Models in a Closed-Loop World
arXiv 2025
Revisiting Model Interpolation for Efficient Reasoning
arXiv 2025
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark
arXiv 2025
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
arXiv 2025
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
arXiv 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
arXiv 2024
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025 1
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models
arXiv 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
arXiv 2024
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
arXiv 2024
Mixture-of-Subspaces in Low-Rank Adaptation
arXiv 2024
Adapting LLaMA Decoder to Vision Transformer
arXiv 2024
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
arXiv 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
arXiv 2024
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
arXiv 2024
A Survey on Data Selection for LLM Instruction Tuning
arXiv 2024
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
arXiv 2024
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023 1
MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
arXiv 2022
Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network
arXiv 2022
Region-Adaptive Deformable Network for Image Quality Assessment
arXiv 2021
FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
arXiv 2021
Affiliations
Frequent co-authors
10from 33 papers