首页|Interpretability of Neural Networks Based on Game-theoretic Interactions

Interpretability of Neural Networks Based on Game-theoretic Interactions

扫码查看
This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge en-coded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this system,we define two game-theoretic interaction indexes,namely the multi-order interaction and the multivariate interaction.More crucially,we use these interac-tion indexes to explain feature representations encoded in a DNN from the following four aspects:1)Quantifying knowledge concepts en-coded by a DNN;2)Exploring how a DNN encodes visual concepts,and extracting prototypical concepts encoded in the DNN;3)Learn-ing optimal baseline values for the Shapley value,and providing a unified perspective to compare fourteen different attribution methods;4)Theoretically explaining the representation bottleneck of DNNs.Furthermore,we prove the relationship between the interaction en-coded in a DNN and the representation power of a DNN(e.g.,generalization power,adversarial transferability,and adversarial robust-ness).In this way,game-theoretic interactions successfully bridge the gap between"the explanation of knowledge concepts encoded in a DNN"and"the explanation of the representation capacity of a DNN"as a unified explanation.

Model interpretability and transparencyexplainable AIgame theoryinteractiondeep learning

Huilin Zhou、Jie Ren、Huiqi Deng、Xu Cheng、Jinpeng Zhang、Quanshi Zhang

展开 >

School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China

XLAB,The Second Academy of China Aerospace Science and Industry Corporation,Beijing 100854,China

National Science and Technology Major ProjectNational Nature Science Foundation of ChinaNational Nature Science Foundation of ChinaShanghai Natural Science Foundation,ChinaShanghai Natural Science Foundation,China

2021ZD011160262276165U19B204321JC140380021ZR1434600

2024

机器智能研究(英文)
中国科学院自动化所

机器智能研究(英文)

CSTPCDEI
影响因子:0.49
ISSN:2731-538X
年,卷(期):2024.21(4)