Generated Contents Enrichment

We study Generated Contents Enrichment (GCE), a conditional image-generation task in which a sparse scene description is first enriched through an explicit scene representation and then rendered into semantically richer visual content. Conventional image-generation systems can produce visually realistic outputs from limited scene descriptions, but the added content is usually implicit in the generator rather than represented as an inspectable intermediate structure. In contrast, GCE seeks to make scene enrichment explicit at the scene-representation level while examining its visual consequences during generation, with the goal of encouraging generated content that is visually plausible, structurally coherent, and semantically richer than the sparse input. To instantiate GCE, we propose a jointly trained adversarial framework that enriches scene graphs by modeling object semantics and inter-object relations. Our approach first represents the input description as a scene graph, where nodes model objects and edges capture inter-object relations. The framework uses graph convolutional networks to predict additional objects and their relations to the existing scene. Finally, the enriched scene graph is passed through the downstream image-generation pipeline to generate the corresponding visual content. We evaluate the framework with proxy scene graph enrichment metrics, image-quality comparisons, qualitative examples, and user studies on the Visual Genome dataset.