I am going to quickly sum up my progress prototyping my final project for Major Studio 2, and progress I have made overcoming technical challenges as well as changes to the project as I realized limitations of both the materials I was using (Arduino, pan/tilt motors, webcam) and the code for emotion recognition and interactivity in openFrameworks.
From VISIONS to CitizenScore
The biggest change moving forward is that I decided to unite my three proposed ideas into a modified version of my original concept.
This project began as “INFORM: Inter-Network Facial Observation & Recognition Monitoring”, a series of critical experiments extrapolating potential future applications of the collection and usage of facial and emotion recognition data.
I wanted to critically explore how people, governments, and institutions may use new interactive technologies to collect, and classify information. More importantly, I want to understand just what they will do with all the insights they have gained.
As mentioned earlier, INFORM consisted of three related explorations that grew out of earlier work: Threat Score, Play with Me and 99 Red Balloons, all of which attempt to look at different consequences of facial and feature tracking.
As I moved along I realized that I would need a very powerful laser to try out popping balloons as outlined in 99 Red Balloons so I put an order in to China for parts, and will receive them in a couple of weeks. Maybe in time to do a short demo project of how this could work, but I’m going to put this aside for now and focus on what I can do with what I have.
After making that decision, I set out to prototype and learn more about the two other experiments and how they might work.
Two Experiments: Threat Score & Play With Me
“Threat Score” seeks to understand how people and institutions will use feature tracking technology to collect people’s emotional responses to media and classify them according to their perceived alignment with desired social paradigms. After having users watch a series of current commercials that are politically and socially divisive in nature, feature tracking technology is used to assign them a score and related descriptor about their quality as a citizen in the modeled social paradigm. This paradigm can be switched at any time depending who is in power. The result of this experience will be a series of prints rating users as threat levels in the current social paradigm. I was interested how people will react to these public declarations of their social desirability and how others will treat them as a result.
On the other hand, “Play with Me” explores the idea of creating fun and cute interactions with surveillance technology to keep users engaged long enough to collect information to record their facial features, identify them and/or displayed emotions, and make a judgment as to the threat they pose. This project uses facial tracking, motors, and synthesized voices to play with users, collecting a variety of their expressions. After analysis, the system will target them with a red laser if deemed a threat. I want to make people aware of the consequences of allowing computers and algorithms to make autonomous decisions about safety and security, but more importantly to demonstrate how the interconnected systems of sensors and cameras we play with now can later be modified and used by institutions to control and “protect” people in times of crisis.
Both of these ideas started to merge together as I explored the limits of the technologies I was working with, and came up with new ways to make the interactions more seamless and engaging.
My new combined concept is Citizen Score, a speculative design installation that imagines a future where ubiquitous surveillance, facial and feature recognition and artificial intelligence finally combine into a system that can gauge a citizen’s worth and likelihood to dissent.
The Process | CitizenScore Experiments & Testing
Considering the Process
What I propose to do is judge people based on their emotional reactions to a set of stimuli (most likely video). In order to do this, I first need to understand what emotion they are having with a somewhat reasonable level of accuracy during specific sections of the video.
My research led me to realize their are 7 universal emotions that people express internationally: joy, sadness, fear, anger, surprise, disgust, and contempt. I also came to find from Affectivas website that you can measure emotional response in terms of engagement and valence.
Engagement: A measure of facial muscle activation that illustrates the subject’s expressiveness. The range of values is from 0 to 100.
Valence: A measure of the positive or negative nature of the recorded person’s experience. The range of values is from -100 to 100.
Emotions are composed of different combinations of our features. For example, a smile might involve raising the corners of your lips, slightly raising your eyebrows, and widening your jaw. In order to predict what emotions are having, I need to first understand what combinations of features are produced when we are having that emotion. Looking again to Affectiva, I found that their API tracks several of these indicators with explanations of what you are looking in terms of the main feature at work, e.g. “the lips pushed forwards” for lip pucker.
Experiment 1: Using Affectiva to set Numerical Values for Emotion Ranges → openFrameworks Feature Values
Unfortunately, openFrameworks does not (currently) have any sort of emotion recognition package. This would involve a database of several faces that compare the likelihood of a given expression to those matched previously, which is why Affectiva’s API is so successful. In other words, we need to say that the facial features for a given emotion we are tracking are within the average range of those features for that emotion, e.g. looking at tracked mouth height vs. mouth height of average range of those showing that emotion.
The good news is that openFrameworks does have facial feature recognition and tracking – observing the change in the values of these features over time.
In order to approximate emotions like the Affectiva program (which I don’t now enough about OSX to add in my own features for this project) I will need to create my own average range for these features, only accomplished by recording different people’s faces over a range of emotions. After all, we all have a range of feature sizes and expressions. But, since Affectiva is already so accurate, what I can do is ask a test group of users to let me record them doing different emotions using the Affectiva application. I will record this as a video and feed it into an OpenFrameworks program that allows me to see the value for each of the features available (mouth width, mouth height, nostril flare, left & right eye openness, left and right eyebrow height, and jaw openness). While the video is running, I will record the values for the facial expressions when the emotion probability is high in the Affectiva app (displayed on screen in the video). So if the Affectiva records the person has 100% probability of Joy, I will record the values for each expression for Joy.
The next problem is that this isn’t helpful if I don’t have a neutral expression to go on. If I just say that happiness is when the mouth is greater than some variable, it will never work as I realized in issues with past projects this semester. Instead, I need to also record a neutral expression value for each person in the experiments above and instead create a range of values compared to this neutral value.
- Dana’s Neutral Mouth Width = 4.0
- John’s Neutral Mouth Width = 3.7
- Average Neutral Mouth Width = 3.85
- Dana’s Joy Mouth Width = 7.2
- John’s Joy Mouth Width = 6.8
- Average Joy Mouth Width = 7.0
Average Difference = 7.0 – 3.85 = 3.15
When tracking for joy, I can now say that the average mouth width for joy is then neutralMouthWidth + 3.15. In this case, both Dana and John will be checked for their neutral mouthwidth at the beginning of the program, and indications of emotions will be based on the difference against this initial read. I hope to make the experiments more accurate at detecting emotion (which is really defined by changes in features to some neutral feature set) across different people.
At the end of this process, I will have a set of values for each facial feature for each of the different emotions.
The other choice I may make if this proves to be inaccurate or too time-consuming is to instead measure the engagement and valence of the participant. Engagement checks to see how expressive the person’s face is from neutral (aka bored) and valence checks to see if the overall mix of expressions is positive or negative (range is from -100 for bad to 100 for good). I could then measure how engaged someone was in a particular video and whether they were good engaged (thought it was positive) or bad engaged (thought it was negative). While these readings would be a bit broader, they are still accurate enough to satisfy the goals of the project, which are to create a composite score for someone based on their reactions to the videos.
After doing the above process and trying to see how it would work in its execution for setting variables, I ran into an issue. When I try to put sets of features together to define a particular emotion, I can’t get them to show positive. If I only base happiness off the width of the mouth (for example) since that is the biggest change, I can predict smiling but when I try to then layer on another image feature it gets a little upset. I think I may have to be very conservative with my values and only use feature pairs that change the most. For example smiling is a combination of mouth height and width according to my preliminary research.
The other option is to record people’s reactions to the video using the Affectiva application, save a video, then run the video through an openframeworks program where I can specifically set samples of the features to a YML file. This is the next approach I will take to see if it works better.
- Try recording emotions in Affectiva first (at beginning of video) and use the Expressions part of FaceTracker library to record each user’s individual emotion range to get a baseline at the start of each test. Then play a short video with the material we are testing for.
- Use probabilities of certain expressions to trigger counter system. For example, if probability of happiness > 95%, racismCounter = racismCounter + 1, and so on.
Experiment 2: Using Affectiva to Set YML Expression File Used in OpenFrameworks
In this experiment, I again used the Affectiva OSX app to reliably measure emotion ranges. However, given the limitations of openFrameworks and detecting some of the more subtle facial feature shifts (like one lip corner higher than another) I decided to simplify and only record “bad” and “good” emotions.
This actually has precedent in research regarding emotional reactions to stimuli such as “The Facial and Subjective Emotional Reaction in Response to a Video Game Designed to Train Emotional Regulation (Playmancer)” which measures anger and joy, pointing out that “they represent a positive and a negative basic emotion. Using a positive and a negative emotion is a commonly used method in the current literature in the field.”
I decided to eliminate anger and fear since of all of the emotions in Affectiva’s software, they are the hardest to measure with 100% accuracy given my experiments.
After these simplifications, the Citizen Score will measure:
Neutral Face – Baseline
Using Affectiva to measure the reliability of the given emotion, I will simultaneously run an openFrameworks application that can save different expressions as YML files, which record the pixel information of a given image. These expressions can later be loaded for live video of a face and can give a probability that the expression is being shown again. Using this method, I will be able to load a live video of the person sampled and record the likelihood of them presenting a positive or negative emotion using the preset YML files.
I first tried this approach using myself as a baseline. I wrote a new openFrameworks code modeled after the “expressions” example in the ofFaceTracker library. This allows me to add new expressions to measure for in a given camera/video stream as well as multiple samples of an expression to ensure reliability (e.g. between a small smile and large smile for a good response, or between disgust and sadness for a bad response).
I then ran the Affectiva program simultaneously with my openFrameworks program and when the Affectiva emotion score was 100%, saved a sample of the given expression I was measuring to my openFrameworks YML files. After this I was able to modify the description line of each YML file to “bad”, “good”, “neutral” for processing later.
- This method proved much more reliable than trying to set ranges of feature sizes to predict emotion. Since the YML file actually saves the pixel data of the image/video it is analyzing, it can much more accurately predict a similarity in this data later.
- I also helped a classmate experiment using this theory and we were able to build a program to recognize different people – proven they all had a neutral expression! I think adding more samples of each person wearing different expressions could potentially solve that problem.
- At the end of the day, this method is actually similar to how Affectiva was created, building a database of different people wearing different expressions and predicting the probability that those expressions match or not.
- This method allowed me to collect the data I need and have a working baseline of expression examples to run through my CitizenScore program for analysis and judgment.
With the YML files created using the method above, I was ready to move forward with testing CitizenScore. I modified my previous program to allow me to load the collected files and use them as baselines for modifying two counters, badCounter, goodCounter. A neutral expression makes no change to either.
Using this method I was able to quickly get both counters up and working, using the description in the YML file to discern which emotion was currently on screen. I loaded an earlier video I had taken of myself doing several different expressions and ran it through CitizenScore, successfully changing both counters to get a read on the good and bad emotions i was displaying in the video.
One part I am unsure of regarding effectiveness is the number of samples that should be taken of a given expression. For example, should joy only be recorded as a big open mouth “Duchenne smile” or “real smile” which involves a contraction of both the muscle that raises the corner of the mouth and the muscles which raise the cheeks and form crow’s feet around the eyes? Or should I also include smaller variants that just involve raising the corners of the lips?
(Many) Next Steps
- Collect several videos or YML samples of classmates doing bad and good types of emotions listed above. Decide if just to use one emotion pair joy vs. anger, joy vs. disgust, joy vs. sadness, or try to save a sample of multiple bad emotions for accuracy.
- If it is a video that is collected, run it through openFrameworks to create YML files for neutral, good, bad for each person (if live just save the YML files and label)
- Modify descriptions for all collected files to “good”, “bad”, “neutral” to feed into CitizenScore
- Create a 3-minute video depicting pro/anti-racism, pro/anti gay, liberal v. conservative or pro/anti-sexism
- Use the getElapsedTime function in openFrameworks to accordingly modify counter variables for desired response during the correct section of the video
- User test and have users watch the video and have their score run through citizenScore
- At the end of the test, tell the user if they are Racist, Homophobic, Liberal v. Conservative (have score flash on the screen?)
- Maybe give them some sort of card or something????
- Collect screenshots from video and use them to create citizen score prints.
- Potentially use the citizenscores to feed into the face-tracking camera experiment conducted separately as a way to decide if people are dangerous….
Potential Videos for Reaction Measurement
- Coke SuperBowl Ad (Pro Diversity in America) – https://www.youtube.com/watch?v=RiMMpFcy-HU&nohtml5=False
- Ronald Reagan – We Must Fight US Military Mix – https://www.youtube.com/watch?v=tpH5L8zCtSk&nohtml5=False
- Love Has No Labels – Ad Council – https://www.youtube.com/watch?v=PnDgZuGIhHs&nohtml5=False
- “Wheres Mommy?” Anti Gay Ad – https://www.youtube.com/watch?v=fLDP8tdeqpY&nohtml5=False
- Sexism & Sexual Violence in Ads – https://www.youtube.com/watch?v=9OQYTy1FU_c&nohtml5=False