Using Deep Learning To Measure The Facial Emotion Of Television

Getty Images.

Deep learning is increasingly capable of assessing the emotion of human faces, looking across an image to estimate how happy or sad the people in it appear to be. What if this could be applied to television news, estimating the average emotion of all of the human faces seen on the news over the course of a week? While AI-based facial sentiment assessment is still very much an active area of research, an experiment using Google’s cloud AI to analyze a week’s worth of television news coverage from the Internet Archive’s Television News Archive demonstrates that even within the limitations of today’s tools, there is a lot of visual sentiment in television news.

To better understand the facial emotion of television, CNN, MSNBC and Fox News and the morning and evening broadcasts of San Francisco affiliates KGO (ABC), KPIX (CBS), KNTV (NBC) and KQED (PBS) from April 15 to April 22, 2019, totaling 812 hours of television news, were analyzed using Google’s Vision AI image understanding API with all of its features enabled, including facial detection.

Facial detection is very different from facial recognition. It only counts that a human face is present in an image, it does not actually attempt to discern who that person is. Across the board, Google's visual APIs permit only facial detection. None offer facial recognition.

In the case of Google's API, for each face it also estimates the likelihood that it expresses one of four emotions: joy, surprise, sorrow and anger.

To explore the world of facial emotion in television news, the 812 hours of television were converted into a sequence of 1fps preview images and run through the Vision AI API, resulting in a total count of 12,612,428 face-seconds (the total number of frames times the total number of clear human faces detected in each).

Of those, 3.25% depicted the emotion of joy, 0.58% depicted surprise, 0.03% sorrow and 0.004% anger.

Google’s Vision AI API tends to report much higher rates of joy and surprise than anger and sorrow for online news imagery as well, so it is unclear whether these relative breakdowns reflect fundamental tendencies of news imagery to emphasize certain human emotions or whether Google’s algorithms are simply better at detecting joy and surprise. Regardless, even if they reflect greater algorithmic sensitivity towards certain emotions, those will likely be constant across stations, allowing direct comparison of the seven stations across each of the four facial emotions.

The graph below shows the percentage of all human faces across the seven stations that depicted any of the four emotions during the examined week. ABC, CBS and NBC seem to exhibit the most recognizable facial emotion, followed by Fox News, MSNBC and CNN, with PBS last, but roughly on par with CNN.

Kalev Leetaru

Most of this ranking is driven by each station’s depiction of joy, seen in the graph below.

Kalev Leetaru

Looking to the emotion of surprise, its ranking is almost inverted, with PBS having the highest density of the emotion, with CBS in the middle and ABC last.

The four emotions are not mutually exclusively so ABC’s higher density of joy does not automatically mean it would have less surprise.

Kalev Leetaru

Sorrow is a far less common emotion but there is still strong stratification among the stations, with NBC depicting the facial emotion three and a half times more often than MSNBC.

Kalev Leetaru

Finally, anger is the rarest facial emotion of them all. Despite this, it also shows the strongest stratification, with faces on Fox News exhibiting the emotion one and a half times more than the next biggest, PBS and almost ten times more than the lowest, NBC.

In fact, angry faces appear on Fox News more than on MSNBC and CNN combined.

Kalev Leetaru

It is important to keep in mind that AI’s ability to discern the complexity and nuance of something as difficult as human facial emotion is still very much in its infancy and is a very active area of research. Thus, the results here should be understood not as a definitive conclusion about the relative prevailing facial emotion on each station but rather as a first glimpse at the kinds of deeper latent dimensions that deep learning can help us understand about television news that we’ve never even contemplated before at this scale.

Putting this all together, deep learning is opening new frontiers in our ability to understand the visual world, from cataloging the objects and activities seen in television news at scale to assessing entirely new dimensions we’ve never dreamed of, like facial emotion.

In the end, much as textual sentiment analysis has become commonplace, so too someday will visual sentiment analysis, powered by an emerging world of advanced visual deep learning algorithms.

I’d like to thank the Internet Archive and its Television News Archive, especially its Director Roger Macdonald. I’d like to thank Google for the use of its cloud, including its Video AI, Vision AI, Speech-to-Text and Natural Language APIs and their associated teams for their guidance.

More From Forbes

Using Deep Learning To Measure The Facial Emotion Of Television