0

pix2gestalt: Amodal Segmentation by Synthesizing Wholes

Pix2gestalt uses conditional diffusion models, trained on synthetic occluded-object datasets, to achieve zero-shot amodal segmentation better than supervised approaches and enhances object recognition and 3D reconstruction in occlusions.

Year
2024
Venue
CVPR 2024 1
Authors
7
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2401.14398ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transferring their representations to this task, we learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases, including examples that break natural and physical priors, such as art. As training data, we use a synthetically curated dataset containing occluded objects paired with their whole counterparts. Experiments show that our approach outperforms supervised baselines on established benchmarks. Our model can furthermore be used to significantly improve the performance of existing object recognition and 3D reconstruction methods in the presence of occlusions.

Authors

7