Multi-Modal Spatio-Temporal Graph Neural Network with Mixture of Experts for Soil Organic Carbon Prediction

Top-soil organic carbon (SOC) prediction is fundamental to agricultural sustainability, land use policy and fertilization planning. Existing approaches face two limitations: they pair hand-crafted covariates with classical ML or single-modal deep models that miss rich spectral and temporal information, and grid-based architectures ignore the irregular spatial structure of field measurements. We introduce SpTGNN, a multi-modal spatio-temporal graph neural network addressing both. SpTGNN represents soil measurements as nodes in a heterogeneous graph with three edge types (spatial proximity, spectral similarity, elevation), and applies relational graph attention to learn separate patterns per relation. A fine-tuned TerraMind encoder extracts node features from Sentinel-2, Sentinel-1 and DEM signals, combined with per-sample environmental covariates and learned positional and temporal embeddings. A sparse Mixture-of-Experts module fuses the four streams via top-k routing. Uncertainty is captured by pairing heteroscedastic regression (aleatoric) with deep ensembles (epistemic), and a Moran's I penalty regularizes spatial autocorrelation. We evaluate on a global SOC corpus split into three regional instances (\sim49k samples globally, Africa \sim26k, Europe \sim14k). Our 5-member deep ensemble reports R^2=0.762, RMSE =3.51\pm0.48 g/kg and MAPE =22.9% on the Africa test split, improving over a tabular XGBoost baseline; the best single checkpoint reaches validation R^2=0.864. Ablations confirm the heterogeneous graph, MoE fusion and fine-tuned backbone each contribute substantively, and the ensemble UQ stack achieves post-calibration ECE of 0.031 (hybrid) and 0.026 (β-NLL). To our knowledge, this is the first framework to unify foundation-model feature extraction, heterogeneous graph attention and decomposed uncertainty quantification for SOC estimation.