What Do Neural Networks Learn for TDOA Estimation? A Cross-Architecture Probing Study

Neural networks outperform classical GCC-PHAT for Time-Difference-of-Arrival (TDOA) estimation in noise and reverberation, yet their internal strategy remains unexplored. To uncover it, we turn GCC-PHAT's mathematical steps into diagnostic targets, probing hidden layers of three architectures (MLP, CNN, Transformer) and complementing with gradient attribution and causal frequency masking. We find that cross-power computation consistently emerges across all architectures and conditions, while PHAT whitening, the defining step of GCC-PHAT, fails to emerge. Instead, networks learn a magnitude-aware frequency weighting that preserves per-frequency reliability information discarded by PHAT. This makes PHAT an information bottleneck: removing it from both classical and neural GCC pipelines improves performance under additive noise. On real-world reverberant data, PHAT remains the best classical weighting, but end-to-end networks achieve lower error by learning data-adaptive weighting.