Yao Hu
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
arXiv 2026
Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling
arXiv 2026
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
arXiv 2026
FireRed-Image-Edit-1.0 Techinical Report
arXiv 2026
FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System
arXiv 2026
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
arXiv 2026
FireRed-OCR Technical Report
arXiv 2026
Balancing Understanding and Generation in Discrete Diffusion Models
arXiv 2026
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
arXiv 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
arXiv 2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
arXiv 2025
Redefining Machine Translation on Social Network Services with Large Language Models
arXiv 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
arXiv 2025
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
arXiv 2025
Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions
arXiv 2025
Interleaving Reasoning for Better Text-to-Image Generation
arXiv 2025
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
arXiv 2025
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
arXiv 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
ICCV 2025
InstantID: Zero-shot Identity-Preserving Generation in Seconds
arXiv 2024
Vript: A Video Is Worth Thousands of Words
arXiv 2024
NoteLLM-2: Multimodal Large Representation Models for Recommendation
arXiv 2024
Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance
arXiv 2024
Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective
arXiv 2024
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025 1
A Sanity Check for AI-generated Image Detection
arXiv 2024
VISA: Reasoning Video Object Segmentation via Large Language Models
arXiv 2024
Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation
arXiv 2024
Towards Open-Vocabulary Video Instance Segmentation
ICCV 2023 1
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
CVPR 2024 1
OvarNet: Towards Open-vocabulary Object Attribute Recognition
CVPR 2023 1
ZONE: Zero-Shot Instruction-Guided Local Editing
CVPR 2024 1
Controllable Mind Visual Diffusion Model
arXiv 2023
Affiliations
Frequent co-authors
10from 33 papers