Adaptive Discounting of Training Time Attacks

Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a…

Open

Year: 2024
ArXiv: arxiv.org/abs/2401.02652
URL: arxiv.org/abs/2401.02652v1
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2401.02652v1
TL;DR: Semantic Scholar

Attribution policy →