Ads, privacy and confusion

The consumer internet industry spent two decades building a huge, complex, chaotic pile of tools and systems to track and analyse what people do on the internet, and we’ve spent the last half-decade arguing about that, sometimes for very good reasons, and sometimes with strong doses of panic and opportunism. Now that’s mostly going to change, between unilateral decisions by some big tech platforms and waves of regulation from all around the world. But though we say ‘privacy’ a lot, I think we lack any coherent, systematic sense of what that would mean, or even quite what we’re trying to achieve. We are confused, and there are many unresolved questions.

First, can we achieve the underlying economic aims of online advertising in a private way? Advertisers don’t necessarily want (or at least need) to know who you are as an individual. As Tim O’Reilly put it, data is sand, not oil - all this personal data actually only has value in the aggregate of millions. Advertisers don’t really want to know who you are - they want to show diaper ads to people who have babies, not to show them to people who don’t, and to have some sense of which ads drove half a million sales and which ads drove a million sales. Targeting ads per se doesn’t seem fundamentally evil, unless you think putting car ads in car magazines is also evil.

But the internet became able to show car ads to people who read about cars yesterday, somewhere else - to target based on the user rather than the context. This is both exactly the same and completely different. In practice, ‘showing car ads to people who read about cars’ led the adtech industry to build vast piles of semi-random personal data, aggregated, disaggregated, traded, passed around and sometimes just lost, partly because it could and partly because that appeared to be the only way to do it. After half a decade of backlash, there are now a bunch of projects trying to get to the same underlying advertiser aims - to show ads that are relevant, and get some measure of ad effectiveness - while keeping the private data private. This is the theory behind Google’s FLoC and Apple’s rather similar tracking and ad-targeting system - do the analysis and tracking on the device, show relevant ads but don’t give advertisers or publishers the underlying personal data.

However, even if the tech works and the industry can get to some kind of consensus behind any such project (both very big questions), would this really be private? And what does it do to competition? This takes me to a second question - what counts as ‘private’, and how can you build ‘private’ systems if we don’t know?

Apple has pursued a very clear theory that analysis and tracking is private if it happens on your device and is not private if leaves your device or happens in the cloud. Hence, it’s built a complex system of tracking and analysis on your iPhone, but is adamant that this is private because the data stays on the device. People have seemed to accept this (so far - or perhaps the just haven’t noticed it), but acting on the same theory Apple also created a CSAM scanning system that it thought was entirely private - ‘it only happens your device!’ - that created a huge privacy backlash, because a bunch of other people think that if your phone is scanning your photos, that isn’t ‘private’ at all. So is ‘on device’ private or not? What’s the rule? What if Apple tried the same model for ‘private’ ads in Safari? How will the public take FLoC? I don’t think we know.

On / off device is one test, but another and much broader is first party / third party: the idea it’s OK for a website to track what you do on that website but not OK for adtech companies to track you across many different websites. This is the core of the cookie question, and sounds sensible, and indeed one might think that we do have a pretty good consensus on ‘third party cookies’ - after all, Google and Apple are getting rid of them.

However, I’m puzzled by some of the implications. “1p good / 3p bad” means that it’s OK for the New York Times to know that you read ten New York Times travel pieces and show you a travel ad, but not OK for the New Yorker to know that and show you the same ad. Why, exactly, is that a policy objective? Indeed, are we sure that it’s ‘private’ for the New York Times to record and analyse everything a logged-in user read on that site for the last decade? What would happen to its ad revenue if it dumped your history after 24 hours? (Cynically, the answer might be ‘not much’.) Is that different to Facebook recording and analysing everything you read on Facebook?

At this point one answer is to cut across all these questions and say that what really matters is whether you disclose whatever you’re doing and get consent. Steve Jobs liked this argument. But in practice, as we've discovered, ‘get consent’ means endless cookie pop-ups full of endless incomprehensible questions that no normal consumer should be expected to understand, and that just train people to click ‘stop bothering me’. Meanwhile, Apple’s on-device tracking doesn't ask for permission, and opts you in by default, because, of course, Apple thinks that if it's on the device it's private. Perhaps ‘consent’ is not a complete solution after all.

But the bigger issue with consent is that it’s a walled garden, which takes me to a third question - competition. Most of the privacy proposals on the table are in absolute, direct conflict with most of the competition proposals on the table. If you can only analyse behaviour within one site but not across many sites, or make it much harder to do that, companies that have a big site where people spend lots of time have better targeting information and make more money from advertising. If you can only track behaviour across lots of different sites if you do it ‘privately’ on the device or in the browser, then the companies that control the device or the browser have much more control over that advertising (which is why the UK CMA is investigating FLoC). And, as an aside, if you can only target on context, not the user, then Hodinkee is fine but the Guardian’s next landmark piece on Kabul has no ad revenue. Is that what we want? What else might happen?

These are all unresolved questions, and the more questions you ask the less clear things can become. I’ve barely touched on a whole other line of enquiry - of where all the world’s $600bn of annual ad spending would be reallocated when all of this has happened (no, not to newspapers, sadly). Apple clearly thinks that scanning for CSAM on the device is more private than the cloud, but a lot of other people think the opposite. You can see the same confusion in ideas like 'Facebook sells your data' (which, of course, it doesn’t) or 'surveillance capitalism' - these are really just attempts to avoid the discussion by reframing it, and moving it to a place where we do know what we think, rather than engaging with the challenge and trying to work out an answer. I don’t have an answer either, of course, but that’s rather my point - I don’t think we even agree on the questions.

Policy, AdvertisingBenedict Evans27 August 2021