AI-Augmented Thyroid Scintigraphy for Robust Classification of Disease

Thyroid scintigraphy is vital for diagnosing thyroid disorders, yet deep learning (DL) models in this domain often struggle with limited, imbalanced datasets. This study investigates the impact of three data augmentation strategies including Stable Diffusion (SD), Flow Matching (FM), and Conventional Augmentation (CA), on enhancing DL-based classification of disease. Anterior thyroid scintigraphy images from 2,954 patients across nine medical centers were classified into four categories: Diffuse Goiter (DG), Nodular Goiter (NG), Normal (NL), and Thyroiditis (TI). Data augmentation was performed using CA as well as various SD and FM models, creating 18 distinct scenarios. Each augmented dataset was used to train a ResNet18 DL-classifier. Model performance was assessed using class-wise and average precision, recall, F1-score, AUC, and image fidelity metrics (FID and KID). FM-based methods demonstrated top-tier performance, with the Original dataset combined with FM (O+FM) configuration achieving the highest micro, macro, and weighted F1-scores (0.78, 0.77, 0.78) and AUC values (0.95, 0.93, 0.94). While the O+FM+CA model also yielded excellent, balanced results, O+FM was statistically superior, indicating that high-fidelity generative augmentation can supersede conventional heuristics. FM also produced the most realistic images, achieving the lowest overall FID (0.66) and KID (0.83). Among the SD variants, SD1 combining image and prompt inputs was the most effective (macro F1: 0.76; FID: 4.17), showing that physician-generated prompts provide critical clinical context. Integrating FM and clinically-informed SD augmentation substantially improves thyroid scintigraphy classification, highlighting the importance of advanced generative models for robust training on limited datasets. The code is available at: https://github.com/MaziarSabouri/Stable-Diffusion-Scintigraphy-Augmentation