Stochastic gradient updates are widely used for their efficiency and scalability, but their effective step sizes can depend strongly on feature scaling and local model sensitivity. Gauss-Newton methods address such scale effects through curvature information, but in their standard mini-batch form they require matrix-vector products, linear solves, or structured approximations. This paper studies the special case of scalar-output losses evaluated one sample at a time. In this setting, the generalized Gauss-Newton matrix has rank at most one, and its only possible nonzero curvature direction is aligned with the stochastic gradient. As a result, the damped Gauss-Newton direction reduces to a closed-form scalar normalization of the sample gradient. The resulting update, Incremental Gauss-Newton Descent (IGND), requires no curvature matrix storage, factorization, or iterative linear solve. We derive the update, characterize its behavior, and relate it to normalized gradient descent, adaptive first-order methods, stochastic Polyak step sizes, and mini-batch Gauss-Newton updates. Under explicit smoothness, alignment, and stochastic approximation assumptions, we prove a stationarity result for the IGND update. Experiments on supervised learning, a controlled test of scale robustness, and a linear-quadratic control case study show that IGND improves robustness to sensitivity scaling and can be competitive with, or complementary to, common stochastic optimizers while retaining a simple incremental update.
Incremental Gauss-Newton Descent for Machine Learning
Stochastic gradient updates are widely used for their efficiency and scalability, but their effective step sizes can depend strongly on feature scaling and local model sensitivity.
- Year
- 2026
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2408.05560ARXIV-DEFAULT
- TL;DR
- Semantic Scholar