0

Jiahao Wang

Papers
33

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
33papers

Authored papers

33

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

arXiv 2026

2026

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

arXiv 2025

2025

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

arXiv 2025

2025

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

arXiv 2025

2025

OmniGen2: Exploration to Advanced Multimodal Generation

arXiv 2025

2025

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

arXiv 2025

2025

Fitness aligned structural modeling enables scalable virtual screening with AuroBind

arXiv 2025

2025

World-in-World: World Models in a Closed-Loop World

arXiv 2025

2025

Revisiting Model Interpolation for Efficient Reasoning

arXiv 2025

2025

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

arXiv 2025

2025

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

arXiv 2025

2025

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

arXiv 2025

2025

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

arXiv 2024

2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

arXiv 2024

2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025 1

2024

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

arXiv 2024

2024

LLaMA Pro: Progressive LLaMA with Block Expansion

arXiv 2024

2024

Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis

arXiv 2024

2024

Mixture-of-Subspaces in Low-Rank Adaptation

arXiv 2024

2024

Adapting LLaMA Decoder to Vision Transformer

arXiv 2024

2024

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

arXiv 2024

2024

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

arXiv 2024

2024

PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization

arXiv 2024

2024

A Survey on Data Selection for LLM Instruction Tuning

arXiv 2024

2024

Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast

arXiv 2024

2024

Memory-and-Anticipation Transformer for Online Action Understanding

ICCV 2023 1

2023

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

arXiv 2022

2022

Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

arXiv 2022

2022

Region-Adaptive Deformable Network for Image Quality Assessment

arXiv 2021

2021

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 33 papers