0

The Optimization Landscape of Carathéodory Decomposition of Toeplitz Covariances

Toeplitz covariance estimation is a classical problem in statistical signal processing, yet the geometry of the Gaussian maximum-likelihood objective remains only partially understood.

Preview
Year
2025
Hosting
Full text hostedCC-BY-SA-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2511.01605CC-BY-SA-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Toeplitz covariance estimation is a classical problem in statistical signal processing, yet the geometry of the Gaussian maximum-likelihood objective remains only partially understood. Recent algorithms, including Newton-type, majorization-minimization, and gradient-based methods, indicate that the nonconvex problem can often be globally solved when the number of samples is sufficiently large, but they also reveal a difficult computational landscape. In this work, we study this phenomenon through an overparameterized Caratheodory representation of positive definite Toeplitz covariance matrices. The Caratheodory decomposition parameterizes the covariance using a combination of steering vectors with different frequencies and amplitudes. Our first result shows that fixed-grid amplitude optimization is fundamentally insufficient. Even in the population setting, and even with arbitrarily many fixed frequency grid points, amplitude-only optimization can have a strictly positive error floor under grid mismatch. This motivates optimizing both amplitudes and frequencies. In this case, our main theoretical result proves that the joint optimization has a benign population landscape: every stationary point that produces a positive definite covariance matrix recovers the true Toeplitz covariance. These findings suggest a simple interpretation of the Toeplitz covariance problem: the population landscape is globally benign, but may be highly ill-conditioned. In our numerical experiments, overparameterization improves convergence speed and finite-sample accuracy. In particular, it allows simple gradient descent to approach the Cramer Rao bound while keeping the implementation simple.