Early Status of the FLoC Origin Trials

Federated Learning of Cohorts, or FLoC, is a new advertising proposal from Google that has been running on a small sample of Chrome browsers for almost two months now. Google enabled publishers to get early access to FLoC data by...

By Don Marti

Federated Learning of Cohorts, or FLoC, is a new advertising proposal from Google that has been running on a small sample of Chrome browsers for almost two months now. Google enabled publishers to get early access to FLoC data by participating in a research program called an Origin Trial. We’ve captured millions of data points from FLoC-enabled browsers since April and are developing approaches to analyzing this data so we can help advertisers and publishers begin to understand how targeting FLoCs will work in the cookieless future. We wanted to bring to the world a first look at what we’ve learned so far about how FLoC works.

At this early stage, FLoC isn’t yet being used by advertisers to target and buy ads. However, we can explore how FLoC cohorts line up to people’s actual content consumption interests, based on traffic trends across the 3,000+ sites that CafeMedia works with.

Shepherding the FLoC

If you’ve been living under a rock (or a cookie-flavored rock) here’s a quick refresher on FLoC. As Google deprecates third party cookies, the Chrome browser team is developing a set of replacement technologies, under the name “Privacy Sandbox.” Many of the proposals are under discussion at the World Wide Web Consortium (W3C), where CafeMedia is an active member. Some parts of the Privacy Sandbox are new browser technologies to improve privacy, while most of the proposals are ways to replace or extend the role of third-party cookies in matching ads to audiences and reporting on the results of ads, to support the critical role that advertising plays in funding amazing, free content.

FLoC is an identifier that Chrome applies to users based on their browsing history. Those cohorts each have thousands of users, unlike the old third-party cookies which can be unique to an individual. Each browser is only ever in a single cohort, although that can change over time as the users’ browsing behavior and content mix changes. 

Before we get into the analysis, it’s worth understanding how FLoC cohorts are calculated and how we aggregated the data. Each user’s cohort is determined by a two-step process. First, Google Chrome uses an algorithm called SimHash that takes a user’s browsing history and converts it into a really long number. This value is like a thumbnail of a large digital photo: you can’t reconstruct the entire browser history from just the SimHash value. But the browser can’t reveal the SimHash, because it might only apply to a small number of users, or reveal something about their web history.

The next step in constructing a FLoC cohort is for Chrome to look up the SimHash in a table that includes ranges of similar SimHashes that should be grouped together. Each group has a shorter number, which is the FLoC cohort. This lookup table is set up to make sure that each cohort has at least a certain number of people in it, and the long SimHash is never sent to any site. Currently, the FLoC ID values go up to about 34,000, but not every possible cohort will be seen, as Google checks each cohort, and if it’s significantly likely to visit domains that Google has tagged as “sensitive,” then the browser will not send that cohort to any site.

The browser’s mapping based on Google’s lookup table takes advantage of the fact that SimHash results that are similar will have similar browsing histories, so grouping them makes sense – the results will be less precise, but still have value. The highest numerical SimHash that ends up in cohort 17000 will be very similar to the lowest numerical SimHash in cohort 17001. 

Aggregating cohorts

We can take advantage of this “closeness” characteristic to start to gather insights. Groups of consecutive FLoC cohorts can let us see general patterns, and do analysis even when there isn’t a lot of data available yet.

To compensate for the low levels of data available so far, we group FLoCs into larger buckets, to make the data more actionable today. This article will feature kFLoCs, which are groups of 1,000 cohorts.

Currently, there are about 34,000 total possible FLoC cohorts, which means there are 34 kFLoCs. 

The question on everyone’s mind is: what do the cohorts mean? Fortunately, our publishers have a broad cross-section of all types of content, so we can get a pretty solid view of how members of different cohorts visit different sorts of content. We do have some verticals that are larger than others, so have attempted to normalize this data to remove any skew.

Click to view larger

Each row in the spreadsheet is a kFLoC and the ten content keywords that most over-index for that cohort compared to the average. We removed common words from the ranking to make it more meaningful. And the exciting thing is that you can immediately start to see some patterns!

For example, the 20000s look like they’re interested in crafting and learning with their family. Their common keywords include “pattern,” “crochet,” and “science.”  The 14000s are into work-related topics such as “business,” “work,” and “model.” The 29000s are reading about “gear,” “model,” and “cars.” The 00000s look at pages about “pictures,” “music” and “support.” Maybe if they need support, one of the 22000s, who like “build,” “apps,” and “online,” can help them out.

Now we’ve also turned this data into a visualization (powered by vis.js) where you can see the interrelationship of keywords across kFLoCs. Tools like this can help show similarities in cohorts that aren’t adjacent to each other, as obviously peoples’ browsing histories are much more complicated than just a single numeric relationship.  Since a user is only in one cohort at a time, and people don’t just do one thing, cohorts have a mix of interest areas. The 31000s are likely to read about “paint” and “room” but also “movies”, maybe to watch while the paint dries?

We also looked at how different types of content attract different kFLoCs. We dug into a few content verticals to see how different kFLoCs visit (or avoid) certain content topics. Even with just a few content verticals, you can immediately see lots of interesting patterns.

For example, Food content is very broadly appealing, and almost all kFLoCs visit it equally, but you can see a few valleys (around 7000, 14000, 24000 and 28-29000) that show groups that probably don’t cook as much, and maybe are a great target group for DoorDash to advertise to! 

We grouped financial and technology content together and found much more pronounced spikes in a few groups of kFLoCs, as that sort of content is more deeply interesting to a smaller part of the overall population. Whereas the long valley from 15000-26000 is representative of people (well, browsers) who probably aren’t that interested in this sort of content. So a brokerage firm, like Charles Schwab, wanting to reach likely investors might start by testing advertising success with FLoCs in the 14000 range.

Other interesting trends can be seen across Home and Garden content. A lot of valleys in the data show cohorts that aren’t that broadly interested in this type of content. There’s some number of cohorts that are floating around 1.0 (meaning they visit this sort of content at an average amount) and then a small number of high peaks. These could represent cohorts of people that are really actively thinking about home decor, design, or gardening and clearly would be very interesting to The Home Depot or Lowe’s. 

We’re analyzing much more data over the coming weeks and months and will continue producing FLoC data — and insights — at an ever-increasing rate as more Chrome browsers join the origin trial. Most of the current data reflects users of the beta version of Chrome, and using a beta browser is a special interest of its own. We expect that the cohorts will change as FLoC gets turned on for more copies of Chrome.

Interested in learning more about FLoC and how it can help you advertise in the future? Contact hello@cafemedia.com

Interested in how CafeMedia can help your website navigate the complexities of digital advertising? Contact publishers@cafemedia.com