Hao Yang
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
arXiv 2026
LongCat-Flash-Thinking-2601 Technical Report
arXiv 2026
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
arXiv 2026
WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models
arXiv 2026
Towards Pixel-Level VLM Perception via Simple Points Prediction
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
Kimi-Audio Technical Report
arXiv 2025
Kimi-VL Technical Report
arXiv 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
arXiv 2025
StereoGen: High-quality Stereo Image Generation from a Single Image
ICCV 2025
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
ICCV 2025
Learning from Peers in Reasoning Models
arXiv 2025
Waver: Wave Your Way to Lifelike Video Generation
arXiv 2025
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
arXiv 2025
Omni-Video: Democratizing Unified Video Understanding and Generation
arXiv 2025
Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
ICCV 2025
Truth in the Few: High-Value Data Selection for Efficient Multi-Modal Reasoning
arXiv 2025
A Preliminary Study for GPT-4o on Image Restoration
arXiv 2025
DeepSeek-VL: Towards Real-World Vision-Language Understanding
arXiv 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
arXiv 2024
Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining
arXiv 2024
Adapting Large Language Models to Log Analysis with Interpretable Domain Knowledge
arXiv 2024
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
arXiv 2024
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
arXiv 2024
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
arXiv 2024
Why Not Transform Chat Large Language Models to Non-English?
arXiv 2024
Qwen Technical Report
arXiv 2023
A Survey on Large Language Model based Autonomous Agents
arXiv 2023
User Behavior Simulation with Large Language Model based Agents
arXiv 2023
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
arXiv 2023
AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose
arXiv 2023
InterFormer: Real-time Interactive Image Segmentation
ICCV 2023 1
An Early Evaluation of GPT-4V(ision)
arXiv 2023
Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN
arXiv 2022
Affiliations
Frequent co-authors
10from 34 papers