0

Baotian Hu

Papers
32

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
32papers

Authored papers

32

WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

arXiv 2026

2026

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding

arXiv 2026

2026

LMEB: Long-horizon Memory Embedding Benchmark

arXiv 2026

2026

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

arXiv 2025

2025

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

arXiv 2025

2025

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

arXiv 2025

2025

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

arXiv 2025

2025

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

arXiv 2025

2025

A Unified Agentic Framework for Evaluating Conditional Image Generation

arXiv 2025

2025

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

arXiv 2025

2025

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

arXiv 2025

2025

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

arXiv 2025

2025

KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model

arXiv 2025

2025

Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs

arXiv 2025

2025

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

arXiv 2025

2025

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

arXiv 2024

2024

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

arXiv 2024

2024

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

arXiv 2024

2024

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

arXiv 2024

2024

In-Context Learning State Vector with Inner and Momentum Optimization

arXiv 2024

2024

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

arXiv 2024

2024

LMEye: An Interactive Perception Network for Large Language Models

arXiv 2023

2023

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

arXiv 2023

2023

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

arXiv 2023

2023

Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation

arXiv 2023

2023

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

arXiv 2023

2023

A Read-and-Select Framework for Zero-shot Entity Linking

arXiv 2023

2023

A Survey of Large Language Models Attribution

arXiv 2023

2023

Revisiting Sparse Retrieval for Few-shot Entity Linking

arXiv 2023

2023

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

arXiv 2023

2023

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

arXiv 2022

2022

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 32 papers