Biz & IT —

How Microsoft’s Cortana will take digital personal assistants to the next level

Cortana puts a friendly face on a fleet of Bing services by learning about you.

How Microsoft’s Cortana will take digital personal assistants to the next level

SAN FRANCISCO—When Microsoft unveiled its Cortana “digital personal assistant” technology this week, some dismissed it as a far-too-late answer to Apple’s Siri and Google Now. But aside from a female voice and some functional overlap, Cortana is not strictly an answer to either of those voice-based tools—it is an answer to broader questions about how to wield the power of cloud computing services in a personal, non-intrusive way.

Cortana is just the first big pay-off from Microsoft’s continuing investment in the Bing platform as well as a host of big-data technologies that could have wide-ranging impact on how people interact with information, applications, and the world around them.

Microsoft clearly has plans for Cortana that go far beyond Windows Phone. At Build, Microsoft executives showed how some of Cortana’s personalization features can be exposed in Bing itself, and it seems almost inevitable that they’ll also be plugged into the Windows, Office 365, and Azure platforms. Bing’s search APIs are already used by Siri, and given the way Cortana was built—a relatively thin client application exploiting local device interfaces, backed by a massive amount of cloud computing power—components of the system could easily find their way into applications for other devices, including those running iOS and Android.

All Microsoft really needs to do is get Cortana to learn how to behave. It's still learning a few things, as my brief hands-on at Build revealed.

Cloud first

In a Build presentation, members of the Bing product team drilled down into Cortana’s architecture to show how the system was a result of living the “cloud first, mobile first” mantra emphasized by new Microsoft CEO Satya Nadella. Cortana is, in the words of Microsoft director of search Stefan Wietz, “an orchestration layer that fires off [services] based on the intent of the user.” And that orchestration layer lives mostly in the cloud.

There are elements of Cortana that live only on Windows Phone—at least for now. One of those is the “Notebook,” a data store of all the user’s preferences, interests, and most important contacts and places. Some details need to be explicitly entered into the Notebook or other applications in order to be used, while others come from discovery of structured data within e-mail messages in the user’s inbox, such as meetings and flight reservations.

To make Cortana less “creepy,” users can edit out things from the Notebook that they’re not comfortable with Cortana tracking, or that might be the result of coincidental pattern recognition. And the local code also handles “geofencing” for requests like ”Remind me to call my wife when I get to work.” Here, Cortana uses known locations from the Notebook as well as geolocation data to determine when the condition is met and the reminder needs to be made.

But most of the local Cortana code is there just to push data from the user’s context back to the cloud. The rest of Cortana leans heavily on Bing cloud components.

For example, Cortana uses Bing’s speech recognition to convert speech to text, and it relies on Bing’s natural language engine to process that text into a query or command. (The speech recognition capability is also available as a component for Windows 8 and Windows Phone developers.) Wietz said that Bing anonymizes and retains voice inputs to the speech recognition system for about 30 days to help engineers understand why things go wrong when the neural network flubs the parsing of a sentence. And Bing live-streams the results it gets from that parsing back as text to the user, so that it's easy for the user to catch when they've been misheard.

A conversation management service hosted in Bing’s infrastructure helps maintain the context of voice conversations with the service, allowing you to “chunk up” a query into multiple parts, as Wietz put it, and allowing you to ask follow-on questions so that Cortana will (usually) be able to infer context for from previous questions.

A Microsoft slide describing the components of Cortana.
Enlarge / A Microsoft slide describing the components of Cortana.

Learning everything

To respond to questions about the world that don’t have answers within the Notebook, Cortana uses Bing’s vast store of knowledge. Much of the semantic power of Cortana is the result of Bing’s “entities” database, which Microsoft initially introduced nearly two years ago.

As Microsoft has built up its collection of entity definitions (with help from those who have added schema information to their Web content through efforts like Schema.org), and it has added streams of structured information from other sources, the database has become an ever-expanding well of knowledge. This allows Cortana to go beyond simply processing a natural language query into a Web search, instead identifying other things that can be done with the results—such as making a reservation at a restaurant.

On top of the entity capabilities, Microsoft has added an army of stream processing servers to watch for event information within Web content and other data sources as they happen. Based on user data, for example, Cortana could subscribe to stream data about an airline flight from an itinerary in the user’s e-mail—a discovered “preference” that the user can delete or approve. Microsoft is running hundreds of millions of “standing queries”—specific stream processing requests that will trigger an alert when matching data is discovered in data, according to Savas Parastatidis, a software architect on the Bing team. These queries can generate alerts for “tens of millions of users,” he said, and the system can be scaled up to billions of queries to serve hundreds of millions of concurrent users.

Channel Ars Technica