Retail AI: Death By A Thousand Cuts

Getty

When people say “AI software”, what do they really mean? This is a critically important question to understanding the impact that AI can have on the retail industry (on any industry, really), and a significant source of the gap that exists between AI hype and reality.

While McKinsey focuses on AI types of classification, prediction, and generation, I’ve found it more useful to look at natural language processing, computer vision, and prediction. But each of these are umbrella terms for lots of what amount to “micro-capabilities”, which is an important limitation when thinking about AI. Artificial General Intelligence (AGI, but sometimes referred to as Human-Level Artificial Intelligence) is at least 20 years off, according to experts in the field, and possibly 50 years. And both ends of the estimate acknowledge that some kind of breakthrough that we have not currently achieved would have to happen before we can get there – there is no current known path to achieving AGI.

What Kind of AI Are You Talking About?

So when the retail industry talks about AI, the first question you have to ask is, “what kind of AI do you mean?”. Baidu, one of the Chinese leaders in AI capabilities, last year released over 100 use cases available through its Baidu Brain platform. When you go through the list (be prepared to spend some time with Google Translate if you don’t read Chinese), it becomes immediately clear that the use cases available are extremely “small” in their capability. For example, under the umbrella of Natural Language Processing (Baidu does not really classify its use cases, so these are my attempts at categorization): speech recognition, speech synthesis, voice wake-up, text recognition, advertising detection, business card identification, passport identification, license plate recognition, form text recognition, lexical analysis, word similarity, text correction, emotional tendency analysis, conversational emotion recognition, article classification, universal translation API, voice translation API, search analysis, and intelligent call center, among many others.

Baidu doesn’t specify a definition for each of these capabilities, so I’m going to have to make some guesses based on the names, but to me, “intelligent call center” is the most advanced of the NLP capabilities in this list. Most likely, it is a combination of multiple of the capabilities listed, like speech recognition, speech synthesis, conversational emotion recognition, and lexical analysis. And each of these are most likely to be independent capabilities that have to be combined, rather than one single engine that is capable of all of these things at once. An AI that could, through one set of algorithms, learn how to detect what a person is saying and how they’re saying it, assign a context to it, and then marshal an appropriate response, would be extremely close to an AGI, and per above, we’re a long way away from that.

More likely, an intelligent call center is a bunch of different algorithms that are assembled into a logical order. Each algorithm is creating an assessment of an input, creating an output based on what the algorithm is designed to detect or predict, and then assigning a probability or a confidence interval that it is correct.

That probability aspect is often overlooked when discussing AI. If a conversational emotion recognition algorithm is only 50% sure that the customer talking to an automated phone tree is angry, should you act on that probability? Would you immediately route all customers to a live agent with a 50% score of possible anger? 70%? Before you can answer that question, you have to consider the consequences of both the Type I errors of incorrectly classifying an angry customer as not angry, and the Type II errors of incorrectly classifying a not-angry customer as angry. In the Type I case, you might end up losing customers with a high lifetime value because you didn’t handle them with the sensitivity that you needed, and in the Type II case, you might end up giving away concessions to customers who didn’t need them. Either way, there’s a cost.

And then you have to ask yourself, how does the AI’s prediction of angry vs. not-angry compare to how well a human could predict it in the first place? Depending on your call volume, it might in the end be cheaper to have humans randomly screen phone tree interactions, rather than inserting an AI into the mix. Given the volumes and scalability you can find for AI, I honestly doubt that this would be the case, but it’s still a question worth asking.

Now add in the multiple algorithms that would be required for a truly “intelligent call center” capability. You have a speech recognition algorithm that is detecting speech and assigning a probability that the speech it heard was a specific set of words. Then there is a conversational emotion algorithm that is listening to the speech and assigning a probability of a specific emotional context to the words. Simplified, we’re focusing on angry vs. not-angry, but there could be a whole range of emotions in that mix. And then there is a lexical algorithm that makes sure (with its own probability score) that the words detected make logical/grammatical sense. And finally there is a speech synthesis algorithm that is taking the output of the lexical analysis and assigning meaning (with its own probability score) to the words that the lexical algorithm confirmed or corrected.

And, for all of this analysis to be useful, there would have to be some kind of prediction engine at the end of this that predicts what it is the customer is looking for (with a probability score that it’s a good prediction) and makes a recommendation for what kind of response would be appropriate (with its own score). The appropriate response would have to be more than just an identification of what the customer wants out of the interaction – it would have to be constrained by what the company is willing to do to appease the customer, both based on general policies and on the customer’s specific circumstances or long-term value to the company.

AI Capabilities vs. AI Applications

The call center example is an exploration of what’s really underneath the covers of an “AI application”. More often than not, there are multiple AI capabilities that are being used, each powered by a different kind of AI engine, each with its own probability score that says how confident the AI is in its output, and some kind of threshold assigned that says when an output should be passed on to be used by the next algorithm down the line, or when an input should be diverted from the process and passed on to a human for review.

There are other prediction types of AI algorithms that do not make this distinction. Some prediction AI engines don’t take explicit steps before spitting out an output. An algorithm that decides that the demand profile of Shamrock Shakes in March in Chicago looks a lot like the demand profile of Pumpkin Spice Lattes in October in Denver is, behind the scenes, defining a set of attributes for each of the products, and then deciding that those attributes are shared enough to say that the two products should be assigned the same demand curve over time. But a lot of the prediction engines out there aren’t exposing those attributes – in fact, the owners and operators of the algorithm may not themselves know or be able to articulate the commonalities that led to an algorithm saying that the two products should be assigned the same demand curve, even though they are two completely different products in two completely timeframes and seasonal contexts. The attributes assigned may be “fuzzy” and not able to be articulated in terms that a human would understand.

On the one hand, not imposing a human definition of attributes could potentially lead to a better answer than if you insist on boxing the analysis into a set of attributes that humans could understand. When DeepMind, Google’s AI project, reportedly developed its own translation language to bridge between English, Japanese, and Korean (the AI was taught English to Korean and then English to Japanese, and then asked to translate between Korean and Japanese without using English as an intermediary), the result was not some new pidgin language that a human would understand, but rather an almost intrinsic grouping of concepts that helped the machine assign a translation (with a probability score, don’t forget that part). Researchers overseeing the project had to go in and interpret the models the machine learning created to help it answer the translation questions it was being asked – it wasn’t like the machine could output exactly the reasoning that went into how it arrived at a translation.

Death By A Thousand Cuts In Confidence

Either way – whether you have an “application” that is really a loose collection of AI algorithms running through a logical sequence, or one AI algorithm that is making a decision but not able (or capable) of exposing its decisions so that humans can understand them – if you assume that AI is an end-to-end capability that you’re implementing, you’re liable to run into trouble.

With the loose collection, stacking up a bunch of decisions that are themselves each scored at a probability of being right, can lead to a false sense of security about the actual “correctness” of the output being recommended. On the other hand, with the more intrinsic solutions, not being able to expose the assumptions being made undermines the ability of an organization to trust in the recommendations being made.

Either way, retailers are risking an AI “death by a thousand cuts” – little instances of probabilities missed or answers taken at full value that should’ve been discounted, or at least further explored – that undermine employees’ trust in the value of what’s being predicted.

I think AI algorithms can overcome these challenges, but it’s yet another area that needs to be addressed before the AI reality can come close to matching the hype.

Follow me on LinkedIn. Check out my website.

More From Forbes

Retail AI: Death By A Thousand Cuts