Wenhao Huang
- Papers
- 55
Cite
Notes
Only stored in your browser.
Authored papers
55In-Place Test-Time Training
arXiv 2026
YuE: Scaling Open Foundation Models for Long-Form Music Generation
arXiv 2025
Seed1.5-VL Technical Report
arXiv 2025
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis
arXiv 2025
A Comprehensive Survey on Long Context Language Modeling
arXiv 2025
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
arXiv 2025
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
arXiv 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
arXiv 2025
AutoMV: An Automatic Multi-Agent System for Music Video Generation
arXiv 2025
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
arXiv 2025
A Survey on Latent Reasoning
arXiv 2025
WideSearch: Benchmarking Agentic Broad Info-Seeking
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
A Systematic Analysis of Hybrid Linear Attention
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
arXiv 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
arXiv 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
arXiv 2025
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization
arXiv 2025
Reverse-Engineered Reasoning for Open-Ended Generation
arXiv 2025
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
arXiv 2025
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
arXiv 2025
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
arXiv 2025
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents
arXiv 2025
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
arXiv 2025
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
arXiv 2025
DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
arXiv 2025
OmniBench: Towards The Future of Universal Omni-Language Models
arXiv 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
arXiv 2024
ChatMusician: Understanding and Generating Music Intrinsically with LLM
arXiv 2024
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation
arXiv 2024
Yi: Open Foundation Models by 01.AI
arXiv 2024
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
arXiv 2024
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
arXiv 2024
Foundation Models for Music: A Survey
arXiv 2024
m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers
arXiv 2024
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
arXiv 2024
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents
arXiv 2024
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
arXiv 2024
MIO: A Foundation Model on Multimodal Tokens
arXiv 2024
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
arXiv 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
arXiv 2024
Can MLLMs Understand the Deep Implication Behind Chinese Images?
arXiv 2024
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
arXiv 2024
LIME: Less Is More for MLLM Evaluation
arXiv 2024
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
arXiv 2024
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
arXiv 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024 1
Chinese Open Instruction Generalist: A Preliminary Release
arXiv 2023
LLaSM: Large Language and Speech Model
arXiv 2023
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
arXiv 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
arXiv 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
arXiv 2023
Can Large Language Models Understand Real-World Complex Instructions?
arXiv 2023
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
arXiv 2023
Affiliations
Frequent co-authors
10from 55 papers