0

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

The research proposes improving image editing with pretrained diffusion models by optimizing specific layers and regularizing attention to enhance accuracy and preserve object details without structural changes.

Year
2023
Venue
arXiv 2023
Authors
8
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2303.15649v3ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions and unexpected changes in non-selected regions.(2) They require careful text prompt editing where the prompt should include all visual objects in the input image.To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after reconstruction and editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique that is used for the unconditional branch of classifier-free guidance as used by P2P. Extensive experimental prompt-editing results on a variety of images demonstrate qualitatively and quantitatively that our method has superior editing capabilities compared to existing and concurrent works. See our accompanying code in Stylediffusion: \url{https://github.com/sen-mao/StyleDiffusion}.

Authors

8