0

A unifying Bayesian framework for adversarial robustness

The vulnerability of machine learning models to adversarial attacks remains a critical societal security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss.

Preview
Year
2025
Hosting
Excerpt onlyCC-BY-NC-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2510.09288CC-BY-NC-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

The vulnerability of machine learning models to adversarial attacks remains a critical societal security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. These deterministic approaches do not account for uncertainty in the adversary's attack. While stochastic defenses placing a probability distribution on the adversary exist, they often lack statistical rigor and fail to make explicit their underlying assumptions. To resolve these issues, we introduce a formal Bayesian framework that models adversarial uncertainty through a stochastic channel, articulating all probabilistic assumptions. This yields two robustification strategies: a proactive defense enacted during training, aligned with adversarial training, and a reactive defense enacted during operations, aligned with adversarial purification. Several state-of-the-art defenses can be recovered as limiting cases of our model. We empirically validate our methodology, showcasing the benefits of explicitly modeling adversarial uncertainty.