Sunday, October 20, 2019

Explaining neural network results

One recurring challenge for neural networks is that they need to give an explanation for their results. In this article we describe how an explanation can be obtained from an image classifier. In this method we find the key pixels that could be changed which would cause the classifier to give a different result.
So now let's choose the notation that we'll use. An image that arrives could be in grey-scale (a single matrix) or a matrix with 3 colour channels red, green and blue. That can be represented as three matrices, R, G, B. Either way, the input can be vectorized: \[\{\textbf{R},\textbf{G},\textbf{B} \} \rightarrow \textbf{X} \] We assume we have a neural net image classifier which gives us an estimate of the probability that the image contains some given objects. \[ 0 \le C_a( \textbf{X} ) \le 1 \] We could have \( C_1( \textbf{X} ) \) is likelihood that the image contains a cat
and \( C_2( \textbf{X} ) \) is likelihood that the image contains a dog
Suppose a cat is deemed the most likely object, so \[ C_1( \textbf{X} ) \gt C_a( \textbf{X} ), \hspace{1cm} \forall a \ne 1 \] Now suppose we want an explanation for why a cat was chosen. We could start by taking the partial derivatives of \( C_1 \) with respect to all the \(X_i\)'s. We will use this gradient when choosing which pixels to change.
We take the gradient: \[ \nabla_i C_1 = \frac{\partial C_1}{\partial X_i}\] and we define: \[\textbf{N} = \frac{\mathbf{\nabla} C_1}{\vert \mathbf{\nabla} C_1 \vert} \] we can now make an adjustment to the image: \[ \textbf{X} \rightarrow \textbf{X} - \epsilon \textbf{N} \] The minus is there because we want to decrease the value of \(C_1\).
With the new \( \textbf{X}\) we then recalculate \(C_1\) and then recalculate the gradient and adjust \( \textbf{X}\) again.
We continue adjusting \(\textbf{X}\) until \(C_1\) is no longer the biggest \(C_a\). We will then have our adjusted image, which is less cat-like. We can then display to the user the adjusted and original images, highlighting the changes, which will show the user why the image was deemed to be a cat.