Archaeology’s Information Revolution

In the near future, every archaeological artifact could be digitally connected to every other artifact.

The Temple of Bel in the historical city of Palmyra, Syria, photographed in August, 2010 (Sandra Auger / Reuters)

Archaeology, as a way of examining the material world, has always required a certain deftness in scale. You have to be able to zoom in very close—at the level of, say, a single dirt-encrusted button—then zoom out again to appreciate why that one ancient button is meaningful.

Any given artifact is simultaneously at the center of its own history, and representative of a much larger story, too.

“Chipped-stone hand axes made hundreds of thousands of years ago and porcelain teacups from the 18th century carry messages from their makers and users,” wrote the archaeologist and historian James Deetz in his book, In Small Things Forgotten. “It is the archaeologist’s task to decode those messages and apply them to our understanding of the human experience.”

Carrying out that task is now possible in ways that were, until very recently, barely imaginable. As the digital and physical worlds collide, archaeology is changing—not just in practice but in scale. A huge database of Biblical-era pottery, for example, means an archaeologist in Jordan can find a shard of pottery from the Iron Age and, in minutes, query how that one fragment of clay connects to every other excavation site in the Holy Land. The frameworks scholars now use to piece together the past are are increasingly built from billions upon billions of overlapping data points.

“I’ve lived through this data transition,” said Thomas Levy, an archaeologist and anthropology professor at the University of California, San Diego. “We still dig like our 19th-century predecessors—with trowels, rubber buckets, shovels, toothbrushes, and so on. But we used to be really encumbered by our ability to record data. We had to be more selective. Now, with these digital tools—with GPS, total stations, laser scanning, and structure from motion photography—we can elect an unlimited amount of data.”

Levy, who helped build the Pottery Informatics Query Database, says his excavations went “totally digital” around 1999. He’s been collecting massive data troves on digs ever since. He’s also experimenting with 3-D visualizations made from data collected at excavation sites—visualizations that can be projected onto physical spaces and might eventually be accessible via virtual reality headsets. The result: people can have the experience of walking through archaeologically significant sites without actually traveling to them. With enough data, people can even have the experience of walking through structures that no longer exist.

“Within the archaeological setting, the site—and the more control we have over space—the more meaningful our observations are.” Levy told me. In other words, the more precision with which researchers can describe an artifact’s physical place in the world, the more value historians can extract from that object and others related to it. Imagine, for instance, an ancient mining site from the time of King Solomon. Someone like Levy might excavate a five-meter-by-five-meter trench through a slag mound where an ancient smelting operation once took place. During that excavation, he and his colleagues would record the geospatial coordinates for every single find—every last ingot fragment, or copper axe, or furnace remnant.

“We’re collecting billions of those data points,” he told me. “And then we sort of mesh them all together and we have not only a 3-D model of the actual excavation from this Biblical period, but we also have a kind of digital data-scaffold in which to embed all the archaeological data points.”

Thanks to satellite data, those data points can now be embedded within a topography of the entire planet. For instance, Sarah Parcak, a space archaeologist, analyzes satellite imagery of Earth, looking for telltale features that might signal a long-lost historical site. Here’s how Wired described her process:

When looking for new archaeological sites, Parcak orders satellite imagery for parcels of land ranging from 65×65 to 165×165 feet square. Then she applies filters to highlight different portions of the electromagnetic spectrum in each image. She’s looking for features that may hint at what’s buried underground. A hallmark clue is the condition of surface vegetation. An architectural structure buried underground can stunt the growth of the flora above it, creating a dead zone—invisible to the naked eye, but detectable in short wave infrared images—in the shape of the underlying infrastructure. In places like Egypt, where vegetation is scarce, satellite imagery can help Parcak distinguish between natural and man-made materials like the mud bricks many tombs are made of.

It’s mind-boggling to think of the amount of data now flowing into the annals of archaeology. But the same thing that makes all this data useful—the sheer volume of information—presents difficult new challenges. Archaeologists aren’t yet sure about the best way to preserve these datasets, and they don’t know how, and in what format, they should be shared across networks.

Lots of people are looking for answers. Levy and his colleagues at several California universities are building a network that contains information from tens of thousands of archaeological sites. And there are other resources, like the Mediterranean Archaeology Network, which contains a series of linked archaeological nodes—which in turn contain regional databases for researchers to query. But the more important question of how to be a steward for massive, scholarly datasets is part of a larger conversation among information scientists that could end up redefining the library as we know it.

All this reflects a profound shift in how human knowledge will be contextualized, stored, and shared as reams of data continue to grow. Increasingly, people are looking for ways to classify and connect datasets. The Library of Congress, for example, is designing a new cataloguing system—for the first time in 40 years—optimized for the semantic web. Today, most of the world’s biggest libraries use an electronic filing system called MARC records, the standard that replaced physical card catalogues in the 1970s. The idea for the next generation of organizing library collections is to have a system that recognizes many more fields of metadata than ever before—and finds connections to other resources both within and outside of any individual institution.

So instead of just listing books and documents by “title,” “author,” “key words,” “genre,” and other basic fields, libraries are thinking about how to be far more descriptive about individual titles and far more comprehensive about how resources connect to one another. They’re also trying to figure out how to handle huge digital assets like datasets—everything from historic climate records to census data to satellite images to geospatial coordinates from archaeological excavations, and so on. I’ve interviewed several librarians who are seriously thinking about how to make this kind of information accessible to those who need it—these are people who are reshaping institutions like the Library of Congress, and Oxford, and Yale, and Harvard—and they all say that huge datasets will transform the fundamental functions libraries serve.

“A library is not a big box filled with books,” said Catherine Murray-Rust, the dean of libraries at Georgia Tech. “It is not just a study hall. Going back to the notion of a library from the past, it is really a space—and today a physical and virtual space—in which people can appreciate the scholarship of the past while they create the scholarship of the future.”

Georgia Tech is in the midst of a major renovation of its library system, an overhaul that will include removing many of the books from public spaces. (Print materials that are removed will still be retrievable upon request.) As the project has moved forward, Murray-Rust says the team working on the new library system has gotten “more radical in our thinking about what a library should be.”

“The huge issue now is data,” she said. “It’s probably more important than text. We have traditional reading rooms where there actually are a few books. Books are a tremendous visual cue to people about the seriousness of the space. We love the book, as technology, but we also know it is not the only—and in some fields not the best—vessel for content. This is particularly true with data: The book doesn’t work terribly well.”

Murray-Rust calls data “the new frontier” of human knowledge. She and others agree that data is changing entire industries and academic specialties so quickly that key information is bound to be lost before best practices are standardized. This is perhaps inevitable, but it represents more than just a missing piece of knowledge. People often talk about data points as if they’re conjured from thin air, somehow non-existent until they’re part of a larger set. And though it’s true that meaning arises from assembling great constellations of data, the data itself usually begins in the material world.

Among archaeologists, the datasets collected today—and the visualizations made from that data—may be all that exists after great structures have crumbled.

“Having Palmyra real is much more important than having 3-D models of it, obviously,” Levy told me, referring to the ancient city where several historic sites have been destroyed by ISIS in recent months. “But in a world where we have so much intentional destruction of cultural heritage, we’re in a position now to record it in ways that were impossible even a decade ago.”

Adrienne LaFrance is the executive editor of The Atlantic. She was previously a senior editor and staff writer at The Atlantic, and the editor of TheAtlantic.com.