In recent years researchers have found numerous different ways to fool image classification software. An image of say a face can be tweaked slightly so that image recognition software detects something completely different, perhaps a rifle, even though for a human observer the face is still clearly visible and the rifle cannot be seen. In one respect, when AI is being fooled, it is becoming more human. There are optical illusions that trick a human but not AI and we have now found the reverse.
Let's now look at the mathematics of what is going on. We'll represent the image as a vector x. This can of course contain a value for red, green and blue at each pixel. The neural net estimates the likelihood of a rifle being in the images, We'll call this R(x). For the original input image the likelihood is low. The hacker can then work out the gradient, i.e. the partial derivative of R with respected to each element. This can be done numerically using a standard finite difference approach:
\[\frac{\partial R}{\partial x_i} \approx \frac{R(\underline{x} + \delta x_i) - R(\underline{x} - \delta x_i)}{2 \delta x_i} \]
(The purists might slightly argue with my notation there, but they know what I mean.)
The hacker can then use an iterative gradient ascent methodology to tweak the image, so that the net reports it to be more likely that a rifle is present. After each step, the gradient would be recalculated. One of the interesting features of this hack is that the hacker doesn't need to know the details of the architecture of the neural net. He just needs access to the output R(x). It is sometimes found that a small tweak to the original image results in the net reporting that a completely different object is contained, even though to a human the tweaked image appears almost identical to the original.
But what can we do to improve the robustness of image classification (AI) software?
One option would be to use multiple different neural networks with different architectures and different training data. If one of the nets detects a rifle but all the others detect a face, then we would suspect that one of the nets may be being fooled. One of the nets could be trained on only grey-scale images, i.e. with the red, green and blue replaced by shades of grey. Before the combined neural nets could be deployed, an algorithm would need to be written to combine the results from the various nets. We may have one net that is the best, in which case it would only be over ruled if all the other nets give a different result. However, one problem would be that if the hacker were to return and if he were to attempt the same hack against our combined net, he may also be somewhat successful. Remember we showed above that the hacker doesn't need to know any details of the neural net architecture. If we replace one net with a set of nets it may not necessarily improve matters that much. We would deem our defense against the hack to be a success if the only way the hacker could convince our system that a rifle is in the image would be to change the image so much that we would all agree that it does appear as though a rifle is in the image.
However combining the result from multiple nets does not necessarily guarantee this.
Here is the outline of a suggested algorithm that attempts to defeat the hacker:
We start with our primary, lovingly trained neural net which is fairly reliable. We also construct a set of perhaps 10 alternative nets, with different architectures and trained with different data. When we want to recognize an image, we first use a hash of the image as a random seed and then use a random generator to pick say 5 of the 10 alternative nets. The image is then passed through our primary net along with our 5 chosen alternative nets. The final result is some non-linear combination of our primary net along with the others. When the hacker then comes along and uses his gradient ascent method he would find that there is so much noise from our random switching of the alternative nets that his method doesn't work well. And so we would have a fairly robust defense against the hack.
There are some drawbacks to this approach: Instead of constructing 1 neural net, we would construct and extra 10 and at run-time 5 of the alternative nets would be evaluated. Also we would find that if the exact same image is presented twice the same result would be returned, however if even one pixel were tweaked we could get quite a different result. Some users may deem this instability to be undesirable.
No comments:
Post a Comment