Affective computing and the dystopian future it promises
Facial recognition technology has become a part of our everyday life. But now… it’s almost old tech. Next up? Emotional recognition technology.
The concept is easy to grasp. A computer vision algorithm mimics what a human would do when looking at someone; having been trained to log specific points on faces, it will try to figure out what expressions someone is making, and link it to an emotion.
Effectively, an algorithm could tell your mood from the way you look, and offer a product or service in accordance. Got a smile on your face? How about a Starbucks advert for a Frappuccino to keep the good times coming? Got a frown? How about a Starbucks advert for a frozen coffee to turn it upside down?
Where does the training data come from? From you. Technology companies have captured immense volumes of surface-level imagery of human expressions, including billions of Instagram selfies, Pinterest portraits, TikTok videos, and Facebook photos... without ever really asking for permission. And that’s only the start of the many, many issues plaguing emotional recognition technology. With a quickly increasing number of cameras monitoring our lives, here’s a few reasons to rethink your unwitting contribution to this madness.
NOTE : This article concentrates on visual emotional recognition. Trials are ongoing with linguistic emotion recognition (NLP) as well as vocal emotion recognition. These technologies, though also in need of improvements, are worthy of a different conversation.
The technology makes no sense
The idea of using selfies to train algorithms to recognise emotions and use it for commercial purposes seems fairly straightforward, but crumbles down at the smallest amount of scrutiny. That’s because the entire premise for this technology is flawed.
Firstly, the claim that a person’s emotion can be accurately assessed by analysing their face has been repeatedly debunked. A 2019 systematic review of the scientific literature on inferring emotions from facial movements found that there is no reliable evidence that you can accurately predict someone’s emotional state in this manner. That’s mainly because most algorithms only track the 43 muscles making up the human face, which is a very small amount of data points to gather for something as complex as emotions. Detecting movement is easy; drawing conclusions from them is a lot more difficult. We COULD combine computer vision algorithms with voice analysis and biometrics to improve reliability, but that’s currently not viable on a commercial scale. So tech companies keep peddling services they KNOW are not accurate.
Secondly, the data used to train the algorithms is usually manually tagged by humans, who are ALSO really bad at inferring emotions (if this weren’t the case, our romantic relationships would be incredibly easier). These (normal) misunderstandings are supercharged when it comes to emotional recognition technology : the teams doing the data tagging for western clients are often based in Romania or Egypt. This means they have to deal with their own cultural and historical biases which may affect the way they perceive and categorise emotions.
Thirdly, the technology has an infamous blindside : race, and the fact that computer vision algorithms tend to be far more accurate for Caucasian faces. A recent study has shown that facial recognition software tends to interpret Black faces as having more negative emotions than white faces, specifically registering them as angrier and more contemptuous, even when controlling for their degree of smiling. This is ethically troubling given the fact that some key use cases for the technology are related to policing or employment. As is so often the case, it appears technology is once again targeting communities already besieged by social challenges.
All this should be enough to end the conversation. But this silly “visual emotion recognition” idea keeps coming back, so I’d like to provide a bit more ammunition to those wishing to fight it on a deeper level.
The use cases make no sense
Samsung, Apple, Affectiva, Amazon, Microsoft, IBM… Today, dozens of companies around the world are developing and selling tools that read people’s faces to identify their emotions, and use them to predict their behaviour. It’s said of this market that it will be worth $56 billion by 2024. That’s a terrifying amount of money for something with exclusively dystopian use cases.
Take schools, for example. One company is touting an 80% success rate at identifying sadness and happiness on student’s faces. But is that really necessary to ensure students are engaged? Isn’t 80% fairly low for something as important as children’s education? Teachers are much better trained to understand their students’ feelings, and they can actually do something about it should the need arise. Did I mention these tests were happening in Hong-Kong? Seems like just an excuse to monitor schools and students…
It’s also being used in recruitment by the likes of HireVue to weed out candidates for a specific role before a human even gets to them. If you’ve ever had to smile like a maniac while looking at yourself and answering timed questions that pop up on a screen, you know what I’m talking about (if not, congrats, I hope you never do). However, emotion recognition in no way helps a company understand a person’s actual abilities. And the idea that it helps reduce recruiter biases is laughable given the technology’s many, many faults in that regard. At the end of the day, it’s just a way to save a buck, and not a very good one at that.
It’s also being used in courts, within the defence / security industry and police forces… given the fact that the technology is so inherently flawed, I am still baffled that this is the case. Yet, it should come as no surprise : these entities have a long history of purchasing and making use of tools that simply do not work. Additionally, emotion recognition technology was initially created and funded by the US department of Defence, so it makes sense that they’d try to exercise dibs.
One use case that might be promising is within the media industry. Some marketers have come to use it to analyse reactions to adverts, in order to assess their effectiveness. Similarly, it’s been used to see how audience reacts to different characters appearing in movie trailers. Given how much of a cash cow a good character can become, this might at least make some economic sense. However, is it really that much better, that much more effective than a survey or a focus group? I doubt it.
But wait, it gets worse.
The philosophy makes no sense
On a very human level, the idea that we can read and classify emotions is laughable. There are just so many unanswered questions. How many emotions are there, really, and is it possible to list them all? Are they at all universal? Can some of them be felt at the same time as others? Can we ever visually differentiate between righteous anger and white-hot rage? Are some people not more emotive than others?
A bunch of white guys agreed a century ago that there were six universal emotional states : joy, anger, disgust, sadness, surprise, and fear, and the world has somehow just rolled with it since.
But can an algorithm pick up on pena ajena, the feeling of experiencing peripheral mortification for someone else’s embarrassment? What about Torschlusspanik? Tarab? Saudade? Forelsket? I doubt the geniuses over at Silicon Valley even know about these emotions. How then, could they be expected to create an algorithm to categorise them? Without properly understanding emotions, it could warn a potential recruiter of someone’s unhappiness when that person is in fact just longing for a rainy day. And what of the emotions we cannot control? Should we be blamed, or rewarded for our subconscious feelings because an algorithm says so?
We don’t even all wear our emotions on our faces! You can detect scowl, yes, but can you detect anger? Because a scowl can also mean you’re confused, or thinking, or just passing gas… Hell, we even struggle to tell the difference between seeing someone in pain and someone having an orgasm.
When it comes to emotions, so much depends on context, which a computer just cannot grasp. I’m of course talking about different cultures and sub-cultures, some of which are much more emotive than others. History, too, plays an important part of the way we non-verbally communicate, even at the sub-local level. Without this, the picture is too incomplete to make any decisions, least of all important ones relating to justice or employment.
Under the gaze of a facial recognition camera, we are not ourselves; we are digital renderings of ourselves. The face is not the whole story, in much the same way that our logins, browser history, Facebook friends, Twitter posts, or record of travel, are details of our personal life; but they are not, even when pieced together, the whole story.
A hundred years ago, phrenology was used far and wide by hacks claiming to be able to identify criminals by the width of their skulls. What is happening today is not much more different. Emotion recognition technology is about as reliable as astrology to predict behaviour, yet it’s allowed to make decisions that will greatly impact our lives and the ones of vulnerable populations around us.
Governments need to step up and ban this technology until we have proof that the technology is being used for good, and that it works as intended. As mentioned above, I don’t think this will ever happen.
In the meantime, hold off on uploading that selfie. Tech companies have enough free images, and I’m sure you’d rather not play any part in techno-fascism’s next chapter.
Good luck out there.
Disclaimer : this article was originally written for Honeypot.io, Europe’s developer-focused job platform.