Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph G^u capturing spatial correlations across geography, and a directed graph G^d capturing sequential relationships over time. We predict future samples of signal x, assuming it is "smooth" with respect to both G^u and G^d, where we design new \ell_2 and \ell_1-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for G^u and G^d that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.