0

Hao Wang

Papers
80

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
80papers

Authored papers

80

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

arXiv 2026

2026

OmniGAIA: Towards Native Omni-Modal AI Agents

arXiv 2026

2026

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

arXiv 2026

2026

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

arXiv 2026

2026

SNLP: Layer-Parallel Inference via Structured Newton Corrections

arXiv 2026

2026

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

arXiv 2026

2026

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

arXiv 2026

2026

Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models

arXiv 2026

2026

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

arXiv 2026

2026

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

arXiv 2025

2025

DeepAgent: A General Reasoning Agent with Scalable Toolsets

arXiv 2025

2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

arXiv 2025

2025

Training Video Foundation Models with NVIDIA NeMo

arXiv 2025

2025

Cosmos World Foundation Model Platform for Physical AI

arXiv 2025

2025

OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain

arXiv 2025

2025

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

arXiv 2025

2025

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

arXiv 2025

2025

An Empirical Study on Prompt Compression for Large Language Models

arXiv 2025

2025

DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

arXiv 2025

2025

A Survey on Latent Reasoning

arXiv 2025

2025

Chronos-2: From Univariate to Universal Forecasting

arXiv 2025

2025

Kwai Keye-VL 1.5 Technical Report

arXiv 2025

2025

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

arXiv 2025

2025

VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

arXiv 2025

2025

h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective

arXiv 2025

2025

SQuat: Subspace-orthogonal KV Cache Quantization

arXiv 2025

2025

Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning

arXiv 2025

2025

Tady: A Neural Disassembler without Structural Constraint Violations

arXiv 2025

2025

Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs

arXiv 2025

2025

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

arXiv 2025

2025

ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models

arXiv 2025

2025

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

arXiv 2025

2025

Beyond the Surface: Measuring Self-Preference in LLM Judgments

arXiv 2025

2025

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

arXiv 2025

2025

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

arXiv 2025

2025

Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?

arXiv 2025

2025

Chronos: Learning the Language of Time Series

arXiv 2024

2024

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

arXiv 2024

2024

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

CVPR 2025 1

2024

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

CVPR 2025 1

2024

Nemotron-4 340B Technical Report

arXiv 2024

2024

Implicit In-context Learning

arXiv 2024

2024

An Engorgio Prompt Makes Large Language Model Babble on

arXiv 2024

2024

Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations

arXiv 2024

2024

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

arXiv 2024

2024

Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching

arXiv 2024

2024

Tracking the Feature Dynamics in LLM Training: A Mechanistic Study

arXiv 2024

2024

Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset

arXiv 2024

2024

Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models

arXiv 2024

2024

Raidar: geneRative AI Detection viA Rewriting

arXiv 2024

2024

Continual Learning of Large Language Models: A Comprehensive Survey

arXiv 2024

2024

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

arXiv 2024

2024

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

arXiv 2024

2024

AutoFlow: Automated Workflow Generation for Large Language Model Agents

arXiv 2024

2024

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

arXiv 2024

2024

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

arXiv 2024

2024

Beyond MOT: Semantic Multi-Object Tracking

arXiv 2024

2024

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

arXiv 2024

2024

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models

arXiv 2024

2024

All in an Aggregated Image for In-Image Learning

arXiv 2024

2024

ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

arXiv 2023

2023

A Survey on Large Language Models for Recommendation

arXiv 2023

2023

GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning

ICCV 2023 1

2023

Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models

arXiv 2023

2023

Taxonomy-Structured Domain Adaptation

arXiv 2023

2023

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

arXiv 2023

2023

Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training

arXiv 2023

2023

Woodpecker: Hallucination Correction for Multimodal Large Language Models

arXiv 2023

2023

Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

NeurIPS 2023 11

2023

ProAgent: From Robotic Process Automation to Agentic Process Automation

arXiv 2023

2023

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

arXiv 2023

2023

DBCopilot: Natural Language Querying over Massive Databases via Schema Routing

arXiv 2023

2023

UUKG: Unified Urban Knowledge Graph Dataset for Urban Spatiotemporal Prediction

uukg-unified-urban-knowledge-graph-dataset

2023

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models

arXiv 2023

2023

Robust Perception through Equivariance

arXiv 2022

2022

Knowledge Mining with Scene Text for Fine-Grained Recognition

CVPR 2022 1

2022

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

ICCV 2021 10

2021

Temporal Memory Attention for Video Semantic Segmentation

arXiv 2021

2021

Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions

arXiv 2019

2019

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

rethinking-knowledge-graph-propagation-for-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 80 papers