Nov 10, 2017 10:45 AM

Facebook's Not Listening Through Your Phone. It Doesn't Have To

The internet is awash in theories about Facebook using your smartphone's microphone to eavesdrop on your conversations. It's not. Here's why.

Image may contain Advertisement and Poster

In the bright-eyed naiveté of my first few weeks as Facebook's first leader of the ads targeting effort, I'd eagerly confront each new conspiracy theory.

"Is Facebook scanning my photos and using that for ad targeting?" was one from a Los Angeles Times reporter. "My cousin uploaded a photo of her boyfriend in a San Francisco 49ers jersey, and now I'm seeing 49ers ads. How'd that happen?"

And so it went.

I'd also field new targeting ideas from Facebook employees themselves, who would construct just-so stories around some niche piece of user behavior, and how that could move the needle on Facebook's already soaring ad revenue (e.g. 'show burger ads to people who checked into In-N-Out').

Inevitably, the conspiracy theories and new ideas would die on the rocks of the threefold criterion I eventually formulated to debunk or discard (almost) all of them.

Is it possible?

Is it common?

Does it work?

Feasibility, ubiquity, and efficacy: Those filters demolish almost every Facebook conspiracy theory you'll ever hear.

One that you may have heard recently: Facebook snoops on you via your smartphone's microphone. As with all such theories—9/11 truthers, Obama birthers, 'grassy knoll' advocates—there's just enough seeming evidence to wrap a story around. Here's one viral video supposedly demonstrating the phenomenon.

But it's all bullshit.

Let's put our corporate-branded Facebook product-manager hoodie on for a closer examination. Even if you haven't deleted the Facebook app from your phone, or relegated your phone to a soundproof box, a quick walkthrough of this most recent theorizing will demonstrate just how Facebook thinks about monetizing you, and why your microphone doesn't factor in.

Is It Possible?

To make it happen, Facebook would need to record everything your phone hears while it's on. This is functionally equivalent to an always-on phone call from you to Facebook. Your average voice-over-internet call takes something like 24kbps one way, which amounts to about 3 kBs of data per second. Assume you've got your phone on half the day, that's about 130 MBs per day, per user. There are around 150 million daily active users in the US, so that's about 20 petabytes per day, just in the US.

To put that in perspective, Facebook's entire data storage is 'only' about 300 petabytes, with a daily ingestion rate of about 600 terabytes. Put another way, constant audio surveillance would produce about 33 times more data daily than Facebook currently consumes.

Furthermore, such snooping would be eminently detectable, ringing up noticeable amounts of data on your smartphone as Facebook maintained your always-on call to Zuckerberg. Ever searched for something on your phone while making a call? Notice how it slows to a crawl? Your phone would be like that all the time if Facebook were listening.

Of course, there's a smarter way to do it. The Amazon Echo voice-controlled personal assistant (and its Google equivalent, Google Home), put a slightly Orwellian-seeming listening device in many American homes. The Echo has just enough hardware to detect a very small set of 'trigger' words, which start it listening. Once it detects that trigger word, it's also just smart enough to record the command that follows it, and send it to the Amazon mothership, where the real speech-to-text translation and natural language processing work happens. Data or a request for more details are then beamed back, and your conversation with 'Alexa' continues. The Echo functions merely as a microphone, speaker, and weak computer that does a small voice-recognition task well.

Could the Facebook app do the same, listening only for specific keywords that trigger ads?

Not exactly. The Facebook targeting system had something like a million targetable keywords when I left, and it's likely held steady or increased slightly. But unlike the Amazon Echo, which listens for just one of four trigger words, millions or perhaps billions of words and phrases could land you in a Facebook targeting segment.

For example, saying 'golf,' 'Tiger Woods,' 'The Masters,' or 'Augusta National Golf Course' all should land you in the 'Golf' targeting segment, and your phone would need to detect each and every one. Because it has no specific trigger word for Facebook, your phone would need to listen for every targetable keyword. That means the speech-to-text translation code could only run on your phone itself, a taxing demand even for the beefy cloud servers that usually handle those tasks.

You could maybe hack around the problem by limiting the keyword list, or tightening the mapping from spoken word to targeting keyword to reduce the search space (only the literal word 'golf' instead of 'Tiger Woods'), but it's still daunting to do on every smartphone in existence, from slow, older phones to fast flagships like the iPhone X. Targeting a specific type of phone would ease that burden somewhat, but any significant scale presents an extraordinary challenge.

Furthermore, as in our naive approach above, this would be eminently noticeable as a performance degradation on your phone, since the background inference process would soon eat all your phone's CPU and battery, something you could easily check via the device's monitoring tools. That could change as smartphones get more powerful, and mobile developers more clever at running real computation in situ, but Facebook's targeting engine won't be running on your phone anytime soon.

In brief, the technical challenges of a Zuckian Big Brother scenario are overwhelming, and not likely to be fixed very soon. It’s just not possible at scale.

But what if those technical realities disappeared?

Is It Common?

Let’s assume you could magically generate a perfect digital transcript of every spoken conversation overheard by a Facebook-enabled smartphone. No bandwidth hogging, no pegged CPUs, just a faithful jotting-down of your every utterance.

What fraction of that transcript would contain anything commercially of interest to an advertiser?

Not much, it turns out.

We did just such a test in my first year at Facebook. Code-named 'Project Chorizo,' it involved pushing every piece of Facebook user data then available—posts, link shares, check-ins—into the targeting grinder and seeing if it improved ads performance. Before we even got to the performance side of things (and we’ll cover that shortly), we were instantly struck by how small a fraction of Facebook content even triggered interest from the targeting machine. On the order of single-digit percentages of Facebook posts resulted in any sort of reading from the targeting machine. It was like pressing a field of livestock into the sausage grinder, and getting out one hot dog as a result. And Facebook users are a very large herd.

Herein lies one of the key misunderstandings about Facebook, which I like to call the Narcissistic Fallacy. We’re all the center of our own worlds, and assume our lives terribly important or interesting to outsiders. As a result, we equate what we’d most hate to have revealed with what advertisers (or Facebook) would most like to know. But that’s a completely false equivalence; advertisers don’t care about the vast majority of even your most personal data.

Put another way: Just because I have a naked photo of you on the internet, doesn’t mean anyone would pay money to see it.

The same goes for most of your Facebook data, including your conversations. While there are probably a few conversational snippets that would reveal something commercially interesting, the data advertisers really want to use for targeting isn’t on Facebook. No, that data resides instead in your Amazon shopping cart, or your car dealer, or your local Target, or every other place you tip your hand to capitalism about your wants and desires.

Does It Work?

Never mind feasibility or ubiquity. Imagine Facebook did overhear all those juicy conversations. What does it do with them?

“I need to fly from New York to Boston on December 21st, for less than $300.”

Start running travel ads, Kayak!

“Trump is a real genius, isn’t he? #resist”

Hey GOP…oh wait, no. Hey Democrats. Or is this a Stein supporter?

“That Mark guy at work is a real dog. Asked me out on a date even though I mentioned the boyfriend.”

Purina! Wait, no. OkCupid! Hold on … let me run that AI job again.

Human language is overrun with sarcasm, innuendo, double-entendre, and pure obfuscation. To assume that at-Facebook-scale AI will be able to figure out, even at the fluky level of internet advertising, just what you crave based on any given statement gives these technologies more credence (or paranoia) than they deserve.

Consider again 'Project Chorizo'. After all that sausage-grinding, the uptick in clickthrough rate thanks to inputting user posts to the targeting system was minimal. Not zero, mind you, but way less than advertisers would pay for.

So what explains all these eerie anecdotes and viral YouTube videos?

The vast majority seem to amount to confirmation bias, the internet equivalent of wondering why it always rains after you wash the car. You’re watching the video of the one Facebook user who experienced some improbable event, and ignoring the millions of users who had no such odd coincidence.

Not that every such coincidence is false. Some are pure correlation-means-causation confusion. Go back to that uploaded 49ers jersey photo. What really happened: The 49ers were playing that weekend, explaining both the jersey being worn, and an ad campaign simultaneously targeted at the SF Bay Area. One didn’t cause the other; both were caused by some externality the reporter had ignored.

The harsh truth is that Facebook doesn’t need to perform technical miracles to target you via weak signals. It’s got much better ways to do so already. Not every spookily accurate ad you see is a pure figment of your cognitive biases. Remember, Facebook can find you on whatever device you’ve ever checked Facebook on. It can exploit everything that retailers know about you, and even sometimes track your in-store, cash-only purchases; that loyalty discount card is tied to a phone number or email for a reason.

Before you stoke your Facebook rage too much, know that Twitter and LinkedIn do this as well, and that Facebook copied the concept of 'data onboarding' from the greater ad tech world, which in turn drafted off of decades of direct-mail consumer marketing. It’s hard to escape the modern Advertising Industrial Complex.

The short version to all this tin-foil-hat theorizing: There's no way Facebook is eavesdropping on you right now. But it is tracking you in other—no less insidious—ways you’re not aware of. To quote the soldier's maxim, it's always the shot you don't hear that ultimately gets you.

WIRED Opinion publishes pieces written by outside contributors and represents a wide range of viewpoints. Read more opinions here.