0

Controllable Diffusion-Based Lesion Inpainting for Scalable Histopathology Data Augmentation

Expert-annotated training data remains the critical bottleneck for AI in histopathology, particularly for rare pathologies where even dozens of cases may be unavailable.

Preview
Year
2026
Hosting
Excerpt onlyCC-BY-NC-SA-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2601.08127CC-BY-NC-SA-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Expert-annotated training data remains the critical bottleneck for AI in histopathology, particularly for rare pathologies where even dozens of cases may be unavailable. While data augmentation offers a solution, existing methods fail to generate sufficiently realistic lesion morphologies that preserve tissue-specific architectures. Here we present PathoGen, a diffusion-based generative model enabling controllable, high-fidelity lesion inpainting into benign histopathology images. We validate PathoGen across four datasets representing kidney, skin, breast, and prostate pathology. Quantitative assessment confirms PathoGen outperforms state-of-the-art baselines in image fidelity and distributional similarity. Evaluation by six expert pathologists revealed that synthetic images by PathoGen were only marginally distinguished from real tissue image slightly above chance (57.75% accuracy), demonstrating strong perceptual realism of PathoGen-generated lesions. PathoGen achieved the highest win rate (35.4%) when pathologists ranked generation quality against all baselines. Crucially, augmenting training sets with PathoGen-synthesized lesions improves segmentation Dice scores by up to 0.18 compared to traditional augmentations, with maximum benefit in data-scarce regimes. By simultaneously generating realistic morphology and pixel-level annotations, PathoGen effectively addresses both data scarcity and annotation cost, two critical bottlenecks in computational pathology development.