Facebook’s ‘Red Team’ Hacks Its Own AI Programs

Attackers increasingly try to confuse and bypass machine-learning systems. So the companies that deploy them are getting creative.
illustration of women's face with wireframe
Illustration: Ariel Davis

Instagram encourages its billion or so users to add filters to their photos to make them more shareable. In February 2019, some Instagram users began editing their photos with a different audience in mind: Facebook’s automated porn filters.

Facebook depends heavily on moderation powered by artificial intelligence, and it says the tech is particularly good at spotting explicit content. But some users found they could sneak past Instagram’s filters by overlaying patterns such as grids or dots on rule-breaking displays of skin. That meant more work for Facebook's human content reviewers.

Facebook’s AI engineers responded by training their system to recognize banned images with such patterns, but the fix was short-lived. Users “started adapting by going with different patterns,” says Manohar Paluri, who leads work on computer vision at Facebook. His team eventually tamed the problem of AI-evading nudity by adding another machine-learning system that checks for patterns such as grids on photos and tries to edit them out by emulating nearby pixels. The process doesn’t perfectly recreate the original, but it allows the porn classifier to do its work without getting tripped up.

That cat-and-mouse incident helped prompt Facebook a few months later to create an “AI red team” to better understand the vulnerabilities and blind spots of its AI systems. Other large companies and organizations, including Microsoft and government contractors, are assembling similar teams.

Those companies spent heavily in recent years to deploy AI systems for tasks such as understanding the content of images or text. Now some early adopters are asking how those systems can be fooled and how to protect them. “We went from ‘Huh? Is this stuff useful?’ to now it’s production-critical,” says Mike Schroepfer, Facebook’s chief technology officer. “‘If our automated system fails, or can be subverted at large scale, that’s a big problem.”

The work of protecting AI systems bears similarities to conventional computer security. Facebook’s AI red team gets its name from a term for exercises in which hackers working for an organization probe its defenses by role-playing as attackers. They know that any fixes they deploy may be side-stepped as their adversaries come up with new tricks and attacks.

In other ways, though, mitigating attacks on AI systems is very different from preventing conventional hacks. The vulnerabilities that defenders worry about are less likely to be specific, fixable bugs, and more likely to reflect built-in limitations of today’s AI technology. “It’s different from cybersecurity in that these things are inherent,” says Mikel Rodriguez, a researcher who works on AI vulnerabilities at MITRE Corporation, a nonprofit that runs federal research programs. “You could write a machine-learning model that’s perfectly secure, but it would still be vulnerable.”

The growing investment in AI security mirrors how Facebook, Google, and others also are thinking harder about the ethical consequences of deploying AI. Both problems have roots in the fact that despite its usefulness, existing AI technology is narrow and inflexible, and it can’t adapt to unforeseen circumstances in the way people can.

A growing library of machine-learning research papers documents tricks like altering just a few pixels in a photo to make AI software hallucinate and detect objects that are not present. One study showed that a Google image-recognition service could be fooled into categorizing a rifle as a helicopter; another study 3D-printed objects with a multifaceted shape that made them invisible to the lidar software of a prototype self-driving car from China’s Baidu. Other attacks include “data poisoning,” where an adversary alters the data used to train a machine-learning algorithm, to compromise its performance.

MITRE is working with government clients in areas such as transportation and national security on how they might minimize such vulnerabilities. Rodriguez declines to share details, but he says that just as at Facebook, some US government agencies want to know what could go wrong with the AI they are building into critical functions. His team's projects have included showing it was possible to extract the faces used to train a facial-recognition algorithm, and deceiving machine-learning software installed on aircraft flying overhead to interpret their surroundings. The Department of Defense plans to make AI an increasingly central plank of the US military, from spotting threats on the battlefield to health care and back-office admin.

Facebook’s AI red team is led by Cristian Canton, a computer-vision expert who joined the company in 2017 and ran a group that works on image moderation filters. He was proud of his team’s work on AI systems to detect banned content such as child pornography and violence, but he began to wonder how robust they really were.

In 2018, Canton organized a “risk-a-thon” in which people from across Facebook spent three days competing to find the most striking way to trip up those systems. Some teams found weaknesses that Canton says convinced him the company needed to make its AI systems more robust.

One team at the contest showed that using different languages within a post could befuddle Facebook’s automated hate-speech filters. A second discovered the attack used in early 2019 to spread porn on Instagram, but it wasn’t considered an immediate priority to fix at the time. “We forecast the future,” Canton says. “That inspired me that this should be my day job.”

In the past year, Canton’s team has probed Facebook’s moderation systems. It also began working with another research team inside the company that has built a simulated version of Facebook called WW that can be used as a virtual playground to safely study bad behavior. One project is examining the circulation of posts offering goods banned on the social network, such as recreational drugs.

The red team’s weightiest project aims to better understand deepfakes, imagery generated using AI that looks like it was captured with a camera. The results show that preventing AI trickery isn’t easy.

Deepfake technology is becoming easier to access and has been used for targeted harassment. When Canton’s group formed last year, researchers had begun to publish ideas for how to automatically filter out deepfakes. But he found some results suspicious. “There was no way to measure progress,” he says. “Some people were reporting 99 percent accuracy, and we were like ‘That is not true.’”

Facebook’s AI red team launched a project called the Deepfakes Detection Challenge to spur advances in detecting AI-generated videos. It paid 4,000 actors to star in videos featuring a variety of genders, skin tones, and ages. After Facebook engineers turned some of the clips into deepfakes by swapping people’s faces around, developers were challenged to create software that could spot the simulacra.

The results, released last month, show that the best algorithm could spot deepfakes not in Facebook’s collection only 65 percent of the time. That suggests Facebook isn’t likely to be able to reliably detect deepfakes soon. “It’s a really hard problem, and it’s not solved,” Canton says.

Canton’s team is now examining the robustness of Facebook's misinformation detectors and political ad classifiers. “We’re trying to think very broadly about the pressing problems in the upcoming elections,” he says.

Most companies using AI in their business don’t have to worry as Facebook does about being accused of skewing a presidential election. But Ram Shankar Siva Kumar, who works on AI security at Microsoft, says they should still worry about people messing with their AI models. He contributed to a paper published in March that found 22 of 25 companies queried did not secure their AI systems at all. “The bulk of security analysts are still wrapping their head around machine learning,” he says. “Phishing and malware on the box is still their main thing.”

Last fall Microsoft released documentation on AI security developed in partnership with Harvard that the company uses internally to guide its security teams. It discusses threats such as “model stealing,” where an attacker sends repeated queries to an AI service and uses the responses to build a copy that behaves similarly. That “stolen” copy can either be put to work directly or used to discover flaws that allow attackers to manipulate the original, paid service.

Battista Biggio, a professor at the University of Cagliari who has been publishing studies on how to trick machine-learning systems for more than a decade, says the tech industry needs to start automating AI security checks.

Companies use batteries of preprogrammed tests to check for bugs in conventional software before it is deployed. Biggio says improving the security of AI systems in use will require similar tools, potentially building on attacks he and others have demonstrated in academic research.

That could help address the gap Kumar highlights between the numbers of deployed machine-learning algorithms and the workforce of people knowledgeable about their potential vulnerabilities. However, Biggio says biological intelligence will still be needed, since adversaries will keep inventing new tricks. “The human in the loop is still going to be an important component,” he says.


More Great WIRED Stories