We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.
Axiomatic Attribution for Deep Networks
A new attribution method called Integrated Gradients is proposed to satisfy Sensitivity and Implementation Invariance axioms, enhancing model debugging, rule extraction, and user engagement across various model types.
- Year
- 2017
- Venue
- axiomatic-attribution-for-deep-networks-1
- Authors
- 3
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1703.01365v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar