Never Skip a Batch: Dense Learning of Temporal GNNs via Adaptive Pseudo-Supervision

Temporal graph networks suffer from irregular supervision in realworld dynamic graphs, as most minibatches contain few labeled events. The lack of labels leads to high-variance gradient updates and, consequently, slow wall-clock convergence. To constructively reduce sparsity, our Moving-Averaged Labels (MAL) assigns soft pseudo-targets based on past supervised signals using a running label distribution while leaving the loss and the model architecture unchanged. Thus, supervision gaps are replaced with informative signals independent of a temporal graph model and the message passing or memory components used. Theoretical analysis supports our insight that aggregating historical supervision into moving average targets reduces stochastic gradient variance, yielding faster convergence under mild assumptions. Experimentally, for TGNv2 and DyRepv2 (our modification of DyRep) models, MAL boosts predictive performance, establishing a new SOTA, and improves time-to-accuracy (on average 6x faster to reach the top score) for a common suite of Temporal Graph Benchmark datasets.