Hao Wang
- Papers
- 80
Cite
Notes
Only stored in your browser.
Authored papers
80HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction
arXiv 2026
OmniGAIA: Towards Native Omni-Modal AI Agents
arXiv 2026
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
arXiv 2026
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
arXiv 2026
SNLP: Layer-Parallel Inference via Structured Newton Corrections
arXiv 2026
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
arXiv 2026
T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
arXiv 2026
Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models
arXiv 2026
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
arXiv 2026
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
arXiv 2025
DeepAgent: A General Reasoning Agent with Scalable Toolsets
arXiv 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
arXiv 2025
Training Video Foundation Models with NVIDIA NeMo
arXiv 2025
Cosmos World Foundation Model Platform for Physical AI
arXiv 2025
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
arXiv 2025
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
arXiv 2025
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
arXiv 2025
An Empirical Study on Prompt Compression for Large Language Models
arXiv 2025
DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models
arXiv 2025
A Survey on Latent Reasoning
arXiv 2025
Chronos-2: From Univariate to Universal Forecasting
arXiv 2025
Kwai Keye-VL 1.5 Technical Report
arXiv 2025
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
arXiv 2025
VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction
arXiv 2025
h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective
arXiv 2025
SQuat: Subspace-orthogonal KV Cache Quantization
arXiv 2025
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
arXiv 2025
Tady: A Neural Disassembler without Structural Constraint Violations
arXiv 2025
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs
arXiv 2025
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
arXiv 2025
ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models
arXiv 2025
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
arXiv 2025
Beyond the Surface: Measuring Self-Preference in LLM Judgments
arXiv 2025
MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind
arXiv 2025
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning
arXiv 2025
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
arXiv 2025
Chronos: Learning the Language of Time Series
arXiv 2024
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
arXiv 2024
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
CVPR 2025 1
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025 1
Nemotron-4 340B Technical Report
arXiv 2024
Implicit In-context Learning
arXiv 2024
An Engorgio Prompt Makes Large Language Model Babble on
arXiv 2024
Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
arXiv 2024
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
arXiv 2024
Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching
arXiv 2024
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
arXiv 2024
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset
arXiv 2024
Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
arXiv 2024
Raidar: geneRative AI Detection viA Rewriting
arXiv 2024
Continual Learning of Large Language Models: A Comprehensive Survey
arXiv 2024
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
arXiv 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
arXiv 2024
AutoFlow: Automated Workflow Generation for Large Language Model Agents
arXiv 2024
CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision
arXiv 2024
TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning
arXiv 2024
Beyond MOT: Semantic Multi-Object Tracking
arXiv 2024
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models
arXiv 2024
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
arXiv 2024
All in an Aggregated Image for In-Image Learning
arXiv 2024
ChatHaruhi: Reviving Anime Character in Reality via Large Language Model
arXiv 2023
A Survey on Large Language Models for Recommendation
arXiv 2023
GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning
ICCV 2023 1
Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models
arXiv 2023
Taxonomy-Structured Domain Adaptation
arXiv 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
arXiv 2023
Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training
arXiv 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models
arXiv 2023
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
NeurIPS 2023 11
ProAgent: From Robotic Process Automation to Agentic Process Automation
arXiv 2023
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
arXiv 2023
DBCopilot: Natural Language Querying over Massive Databases via Schema Routing
arXiv 2023
UUKG: Unified Urban Knowledge Graph Dataset for Urban Spatiotemporal Prediction
uukg-unified-urban-knowledge-graph-dataset
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models
arXiv 2023
Robust Perception through Equivariance
arXiv 2022
Knowledge Mining with Scene Text for Fine-Grained Recognition
CVPR 2022 1
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
ICCV 2021 10
Temporal Memory Attention for Video Semantic Segmentation
arXiv 2021
Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions
arXiv 2019
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
rethinking-knowledge-graph-propagation-for-1
Affiliations
Frequent co-authors
10from 80 papers