Bohan Zhuang
- Papers
- 40
Cite
Notes
Only stored in your browser.
Authored papers
40TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
arXiv 2026
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
arXiv 2026
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
arXiv 2026
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
arXiv 2026
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
CoV: Chain-of-View Prompting for Spatial Reasoning
arXiv 2026
Less Detail, Better Answers: Degradation-Driven Prompting for VQA
arXiv 2026
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
arXiv 2025
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting
arXiv 2025
Few-Step Distillation for Text-to-Image Generation: A Practical Guide
arXiv 2025
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction
arXiv 2025
Geometrically-Constrained Agent for Spatial Reasoning
arXiv 2025
PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
arXiv 2025
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
arXiv 2025
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
arXiv 2025
Motion Anything: Any to Motion Generation
arXiv 2025
Neighboring Autoregressive Modeling for Efficient Visual Generation
ICCV 2025
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
arXiv 2024
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views
arXiv 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
arXiv 2024
Streaming Video Diffusion: Online Video Editing with Diffusion Models
arXiv 2024
InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation
arXiv 2024
Evaluating and Advancing Multimodal Large Language Models in Ability Lens
arXiv 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
arXiv 2024
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
arXiv 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
arXiv 2024
ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality
arXiv 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
CVPR 2024 1
Stitchable Neural Networks
CVPR 2023 1
Object-aware Inversion and Reassembly for Image Editing
arXiv 2023
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
ICCV 2023 1
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
arXiv 2023
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
arXiv 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
arXiv 2023
Stitched ViTs are Flexible Vision Backbones
arXiv 2023
Fast Vision Transformers with HiLo Attention
arXiv 2022
EcoFormer: Energy-Saving Attention with Linear Complexity
arXiv 2022
Mesa: A Memory-saving Training Framework for Transformers
arXiv 2021
Scalable Vision Transformers with Hierarchical Pooling
ICCV 2021 10
Affiliations
Frequent co-authors
10from 40 papers