Weiming Zhang
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment
arXiv 2025
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
arXiv 2025
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation
arXiv 2025
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
arXiv 2025
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks
arXiv 2025
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
arXiv 2025
EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
arXiv 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024 1
Diversity-Aware Meta Visual Prompting
CVPR 2023 1
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
ICCV 2023 1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
arXiv 2023
Watermarking Text Generated by Black-Box Language Models
arXiv 2023
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
arXiv 2022
HairCLIP: Design Your Hair by Text and Reference Image
CVPR 2022 1
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
cswin-transformer-a-general-vision-1
DeepFaceLab: Integrated, flexible and extensible face-swapping framework
arXiv 2020
Affiliations
Frequent co-authors
10from 16 papers