BoWen Zhang

Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

arXiv 2025

OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model

arXiv 2025

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

arXiv 2025

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

ICCV 2025

AXLearn: Modular Large Model Training on Heterogeneous Infrastructure

arXiv 2025

rStar2-Agent: Agentic Reasoning Technical Report

arXiv 2025

C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset

arXiv 2025

Structured 3D Latents for Scalable and Versatile 3D Generation

CVPR 2025 1

Emu3: Next-Token Prediction is All You Need

arXiv 2024

Improve Vision Language Model Chain-of-thought Reasoning

arXiv 2024

MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models

arXiv 2024

Ferret: Refer and Ground Anything Anywhere at Any Granularity

arXiv 2023

BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation

arXiv 2023

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

arXiv 2023

VeCLIP: Improving CLIP Training via Visual-enriched Captions

arXiv 2023

MOFI: Learning Image Representations from Noisy Entity Annotated Images

arXiv 2023

Compressing LLMs: The Truth is Rarely Pure and Never Simple

arXiv 2023

SegReg: Segmenting OARs by Registering MR Images and CT Annotations

arXiv 2023

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

arXiv 2023

Large Language Models as Zero-Shot Human Models for Human-Robot Interaction

arXiv 2023