0

Bohan Zhuang

Papers
40

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
40papers

Authored papers

40

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

arXiv 2026

2026

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

arXiv 2026

2026

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

arXiv 2026

2026

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

arXiv 2026

2026

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

arXiv 2026

2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

CoV: Chain-of-View Prompting for Spatial Reasoning

arXiv 2026

2026

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

arXiv 2026

2026

ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS

arXiv 2025

2025

Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

arXiv 2025

2025

Few-Step Distillation for Text-to-Image Generation: A Practical Guide

arXiv 2025

2025

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

arXiv 2025

2025

Geometrically-Constrained Agent for Spatial Reasoning

arXiv 2025

2025

PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

arXiv 2025

2025

BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

arXiv 2025

2025

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

arXiv 2025

2025

Motion Anything: Any to Motion Generation

arXiv 2025

2025

Neighboring Autoregressive Modeling for Efficient Visual Generation

ICCV 2025

2025

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

arXiv 2024

2024

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

arXiv 2024

2024

KMM: Key Frame Mask Mamba for Extended Motion Generation

arXiv 2024

2024

Streaming Video Diffusion: Online Video Editing with Diffusion Models

arXiv 2024

2024

InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

arXiv 2024

2024

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

arXiv 2024

2024

LongVLM: Efficient Long Video Understanding via Large Language Models

arXiv 2024

2024

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

arXiv 2024

2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

arXiv 2024

2024

ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality

arXiv 2024

2024

ModaVerse: Efficiently Transforming Modalities with LLMs

CVPR 2024 1

2024

Stitchable Neural Networks

CVPR 2023 1

2023

Object-aware Inversion and Reassembly for Image Editing

arXiv 2023

2023

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

ICCV 2023 1

2023

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

arXiv 2023

2023

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

arXiv 2023

2023

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

arXiv 2023

2023

Stitched ViTs are Flexible Vision Backbones

arXiv 2023

2023

Fast Vision Transformers with HiLo Attention

arXiv 2022

2022

EcoFormer: Energy-Saving Attention with Linear Complexity

arXiv 2022

2022

Mesa: A Memory-saving Training Framework for Transformers

arXiv 2021

2021

Scalable Vision Transformers with Hierarchical Pooling

ICCV 2021 10

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 40 papers