MIT Tricks Google’s Vision AI into thinking a turtle is a gun

Dec 21, 2017

John Aldred

There have been a lot of positive. useful and sometimes amusing stories about various image AI & machine learning systems over the past couple of years. There have also been some that are either quite creepy or simply the stuff of nightmares. Whatever you use image recognition AI for, though, it seems it can be easily fooled, with a little bit of work.

A team at MIT’s Computer Science and Artificial Intelligence Laboratoy (CSAIL) these systems are even easier to fool than they thought. In a new paper, they’ve developed a system that is up to 1,000 times faster than existing methods. And it works with “black box” systems, too – these are closed source systems to which a hacker has no access to the code.

One system used to fool visual machine learning is essentially a “brute force” style attack. This has been common for years for cracking passwords. You fire off a set of letters at a system as the password, it comes back false, you increment one letter at a time, then rinse and repeat. Eventually, you’ll get in. It might take a while, but you’ll get there.

In the case of image recognition, this method does something similar. You have a starting photo. This is what you want the system to recognise your subject as. Then you have the end photo. This is what you want to be recognised as that thing you started with. You then create images in between that transition from your start to your end images adjusting each pixel slightly one frame at a time.

And now, the system recognises the original subject in the new image, even though we know it’s not there.

The team have also created a method proving how it can be fooled to misclassify real-world 3D objects, too. This video shows how a turtle was misidentified as a rifle and a baseball as an espresso.

It’s a fascinating topic, but it’s also quite scary just how fallible machine learning image recognition systems still are. Of course, they’re still relatively new. So, it’s not all that surprising. But it does make one wonder just how far we should trust them. Especially as facial and object recognition become more pervasive into our lives through things like unlocking our phones, finding us on social media, and more consequential uses.

The team say that they’ve tested their methods against Google’s Cloud Vision API, but that it should work for similar APIs from Facebook and other companies.

As such systems start to see more and more imagery, it’s difficult to predict whether they’ll get smarter or even more confused. But exploits like these, that allow misidentification and misclassification of subjects, are definitely something that will need to be resolved one way or another.

[via Robotics Trends]

Filed Under:

news

Tagged With:

John Aldred

John Aldred is a photographer with over 25 years of experience in the portrait and commercial worlds. He is based in Scotland and has been an early adopter – and occasional beta tester – of almost every digital imaging technology in that time. As well as his creative visual work, John uses 3D printing, electronics and programming to create his own photography and filmmaking tools and consults for a number of brands across the industry.