Jun 11, 2019 7:00 AM

Fans Are Better Than Tech at Organizing Information Online

Archive of Our Own, the fanfiction database recently nominated for a Hugo, has perfected a system of tagging that the rest of the web could emulate.

Kudos to the fans. One of the nominees for the Hugo Awards this year is Archive of Our Own, a fanfiction archive containing nearly 5 million fanworks—about the size of the English Wikipedia, and several years younger. It's not just the fanfic, fanart, fanvids, and other fanworks, impressive as they are, that make Archive of Our Own worthy of one of the biggest honors in science fiction and fantasy. It's also the architecture of the site itself.

At a time when we're trying to figure out how to make the internet livable for humans, without exploiting other humans in the process, AO3 (AO3, to its friends) offers something the rest of tech could learn from.

Here's a problem that AO3 users, like the rest of the internet, encounter every day: How do you find a particular thing you're interested in, while filtering out all the other stuff you don't care about? Most websites end up with tags of some sort. I might look through a medical journal database for articles tagged "cataracts," search a stock photo site for pictures tagged "businesspeople," or click on a social media hashtag to see what people are saying about the latest episode of #GameOfThrones.

Tags are useful but they also have problems. Although "cataracts," "businesspeople," and #GameOfThrones might seem like the most obvious tags to me, someone else might have tagged these same topics "cataract surgery," "businessperson," and #GoT. Another person might have gone with "nuclear sclerosis" (a specific type of cataract), "office life," and #Daenerys. And so on.

There are two main ways of dealing with the problem of tagging proliferation. One is to be completely laissez-faire—let posters tag whatever they want and hope searchers can figure out what words they need to look for. It's easy to set up, but it tends to lead to an explosion of tags, as posters stack on more tags just in case and searchers don't know which one is best. Laissez-faire tags are common on social media; if I post an aesthetic photo of a book I'm reading on Instagram, I have over 20 relevant tags to choose from, such as #book #books #readers #reader #reading #reads #goodreads #read #booksofig #readersofig #booksofinstagram #readersofinstagram #readstagram #bookstagram #bookshelf #bookshelves #bookshelfie #booknerd #bookworm #bookish #bookphotography #bookcommunity #booklover #booksbooksbooks #bookstagrammer #booktography #readers #readabook #readmorebooks #readingtime #alwaysreading #igreads #instareads #amreading. "Am reading" indeed—reading full paragraphs of tags.

The other solution to the proliferation of competing tags is to implement a controlled, top-down, rigid tagging system. Just as the Dewey Decimal System has a single subcategory for Shakespeare so library browsers can be sure to find Hamlet near Romeo and Juliet, rigid tagging systems define a single list of non-overlapping tags and require that everyone use them. They're more popular in professional and technical databases than in public-facing social media, but they're a nice idea in theory—if you only allow the tag "cataract" then no one will have to duplicate effort by also searching under "cataracts" and "cataract surgery."

The problem is rigid tags take effort to learn; it's hard to convince the general public to memorize a gigantic taxonomy. Also, they become outdated. Tagging systems are a way of imposing order on the real world, and the world doesn't just stop moving and changing once you've got your nice categories set up. Take words related to gender and sexuality: The way we talk about these topics has evolved a lot in recent decades, but library and medical databases have been slower to keep up.

The Archive of Our Own has none of these problems. It uses a third tagging system, one that blends the best elements of both styles.

On AO3, users can put in whatever tags they want. (Autocomplete is there to help, but they don't have to use it.) Then behind the scenes, human volunteers look up any new tags that no one else has used before and match them with any applicable existing tags, a process known as tag wrangling. Wrangling means that you don't need to know whether the most popular tag for your new fanfic featuring Sherlock Holmes and John Watson is Johnlock or Sherwatson or John/Sherlock or Sherlock/John or Holmes/Watson or anything else. And you definitely don't need to tag your fic with all of them just in case. Instead, you pick whichever one you like, the tag wranglers do their work behind the scenes, and readers looking for any of these synonyms will still be able to find you.

AO3's trick is that it involves humans by design—around 350 volunteer tag wranglers in 2019, up from 160 people in 2012—who each spend a few hours a week deciding whether new tags should be treated as synonyms or subsets of existing tags, or simply left alone. AO3's Tag Wrangling Chairs estimate that the group is on track to wrangle about 2.7 million never-before-used tags in 2019, up from 2.4 million in 2018.

Laissez-faire and rigid tagging systems both fail because they assume too much—that users can create order from a completely open system, or that a predefined taxonomy can encompass every kind of tag a person might ever want. When these assumptions don't pan out, it always seems to be the user's fault. AO3's beliefs about human nature are more pragmatic, like an architect designing pathways where pedestrians have begun wearing down the grass, recognizing how variation and standardization can fit together. The wrangler system is one where ordinary user behavior can be successful, a system which accepts that users periodically need help from someone with a bird's-eye view of the larger picture.

Users appreciate this help. According to Tag Wrangling Chair briar_pipe, "We sometimes get users who come from Instagram or Tumblr or another unmoderated site. We can tell that they're new to AO3 because they tag with every variation of a concept—abbreviations, different word order, all of it. I love how excited people get when they realize they don't have to do that here."

When I tweeted about AO3's tags a while back, I received many comments from people wishing that their professional tagging systems were as good, including users of news sites, library catalogs, commercial sales websites, customer help-desk websites, and PubMed (the most prominent database of medical research). The other websites that compared favorably to AO3 were also on the fannish side of the spectrum and used a similar system of human-facilitated tag wrangling: librarything (a website where you can list all your books) and Danbooru (an anime imageboard). But, we might ask ourselves, why use humans? Couldn't machine learning or AI or another hot tech buzzword wrangle the tags instead?

One reason for the humans is that AO3 began developing its routines in 2007, when the tech wasn't as advanced and they had a lot of willing volunteers. But even now, tag wranglers are skeptical that a machine could take over their tasks. One wrangler, who goes by the handle spacegandalf, pointed me to the example of a character from an audio drama called The Penumbra Podcast who didn't have an official name in text for several episodes after he was introduced. Yet people were writing fanfic—and trying to tag it by character—before they had any name to tag it with.

Because spacegandalf had listened to this podcast—AO3 deliberately recruits and assigns tag wranglers who are members of the fandoms that they wrangle for—they had the necessary context to know that "Big Guy Jacket Man Or Whatever His Name Is" referred to the same person as his slightly more official moniker "the Man In the Brown Jacket" and his later, official name, Jet Sikuliaq (and that none of these names should be confused with a different mysteriously named character from a different audio drama, the Man in the Tan Jacket from Welcome to Night Vale).

With all these tags properly wrangled, I can not only find "Big Guy Jacket Man" and "the Man in the Brown Jacket" and "Jet Sikuliaq" all in the same search results, but I can also drill down and search for crossover fic containing both the Man in the Brown Jacket and the Man in the Tan Jacket—and, one hopes, an entire world of colored-jacketed friends. Sadly, there is none, but at least I know I have a conclusive answer.

Without tag wranglers, I'd be stuck doing an ordinary search for "jacket" or "jacket man"—the first of which gives me hundreds of results about other irrelevant characters who happen to wear a jacket this one time, and the second of which misses some genuinely relevant results about our jacket men of interest.

Another of the Tag Wrangling Chairs, Qem, also thinks that machine tag wrangling is unlikely, pointing to machine translation as a cautionary tale. “There are terms in fandom which, while commonly understood in context among fans, would not be when you take it out of the fandom context," Qem says. For example, seemingly innocuous words like "slash" and "lemon" do not refer to a punctuation mark or a citrus fruit in fannish contexts, and tag wranglers are already well aware that machine translation can only manage the literal, not the subcultural meanings. Qem's co-chair, briar_pipe, is slightly more sanguine: "I personally think it might be interesting to have AI/human partnerships for this type of data work, but you have to have humans who are aware of AI limitations and willing to call AIs on mistakes, or else that partnership is useless."

AI certainly does have limitations. There always seems to be a new report of products that claim to be AI—Amazon's Mechanical Turk, Facebook's M, Google Duplex, the Expensify receipt scanner—but in fact often involve hordes of poorly paid, undercompensated, invisibilized humans performing "ghost work" that is attributed to AI.

The tag wranglers on AO3 aren't paid at all. The archive's parent organization, the Organization for Transformative Works, is a nonprofit, and everyone involved in the project is a volunteer. But it's also hard to consider them "exploited" like the fake-AI humans. Wranglers are more like the volunteers who edit Wikipedia or moderate Facebook groups. Rather than working for a faceless corporation that would rather pretend they're machines, these volunteers benefit from the same communities they serve. This community-oriented nature is at the heart of AO3's success—it was created by fans who got tired of the capricious takedown policies of for-profit fanfiction hosting sites and decided to buy their own servers, teach themselves how to code, and create a site that was exactly what they wanted, including an incredibly functional tagging system that runs rings around both professional databases and billion-dollar social platforms.

When technologists lament the increasing dominance of the internet by a few large corporations, there's a tendency to look for counterinspiration, if you will, in collaborative projects like Wikipedia or open source software. But fans have also been freely creating things for each other since the very early days of the internet, and fandom contains many people from demographics underrepresented on these more frequently analyzed projects—perhaps both a reason for the success of Archive of Our Own and a reason that this success has been overlooked. Whether it wins the Hugo or not, this nomination is one step toward bringing AO3 the attention it deserves.

Updated 7-14-19, 8 pm EST: This story was updated to correct the number of tags AO3 has wrangled in recent years.