0

Zero-Shot Text-to-Image Generation

A transformer model autoregressively generates text-to-image in a single data stream, achieving competitive performance with domain-specific models in zero-shot evaluations.

Year
2021
Venue
arXiv 2021
Authors
8
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2102.12092v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.

Authors

8