Bin Zhu
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation
arXiv 2026
PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning
arXiv 2026
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
arXiv 2025
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
arXiv 2025
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
arXiv 2025
Open-Sora Plan: Open-Source Large Video Generation Model
arXiv 2024
Next Patch Prediction for Autoregressive Visual Generation
arXiv 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
video-llava-learning-united-visual
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
arXiv 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
arXiv 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
arXiv 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
arXiv 2023
Towards Attack-tolerant Federated Learning via Critical Parameter Analysis
ICCV 2023 1
Affiliations
Frequent co-authors
10from 13 papers