So now let's choose the notation that we'll use. An image that arrives could be in grey-scale (a single matrix) or a matrix with 3 colour channels red, green and blue. That can be represented as three matrices, R, G, B. Either way, the input can be vectorized: \{\textbf{R},\textbf{G},\textbf{B} \} \rightarrow \textbf{X}
We assume we have a neural net image classifier which gives us an estimate of the probability that the image contains some given objects.
0 \le C_a( \textbf{X} ) \le 1
We could have C_1( \textbf{X} ) is likelihood that the image contains a cat
and C_2( \textbf{X} ) is likelihood that the image contains a dog
Suppose a cat is deemed the most likely object, so C_1( \textbf{X} ) \gt C_a( \textbf{X} ), \hspace{1cm} \forall a \ne 1
Now suppose we want an explanation for why a cat was chosen. We could start by taking the partial derivatives of C_1 with respect to all the X_i's. We will use this gradient when choosing which pixels to change.
We take the gradient: \nabla_i C_1 = \frac{\partial C_1}{\partial X_i}
and we define:
\textbf{N} = \frac{\mathbf{\nabla} C_1}{\vert \mathbf{\nabla} C_1 \vert}
we can now make an adjustment to the image:
\textbf{X} \rightarrow \textbf{X} - \epsilon \textbf{N}
The minus is there because we want to decrease the value of C_1.
With the new \textbf{X} we then recalculate C_1 and then recalculate the gradient and adjust \textbf{X} again.
We continue adjusting \textbf{X} until C_1 is no longer the biggest C_a. We will then have our adjusted image, which is less cat-like. We can then display to the user the adjusted and original images, highlighting the changes, which will show the user why the image was deemed to be a cat.