Wrapping up the end of my first project for Major Studio 2, a set of mini-projects exploring potential uses for and critiques of applications powered by computer vision. This project also served as an opportunity study the history of computer vision and learn about precedents that utilized its power for their works, whether artistic or practical.
After my first few weeks working on this project I had a bit of a double-whammy disaster strike.
First, my trusty Macbook Pro 2009, loving refurbished this September with a new hard drive & RAM, finally died. It turns out – after 2 days at MicroCenter in Sunset Park – that the SATA cable is bad and I needed to buy a new one. Unfortunately, this would take 2 weeks! So I sucked it up and after 3 or 4 days of borrowing laptops every 4 hours from the sixth floor – also coinciding with the entry round deadline of the competition for my job – I went to the Apple store and got a brand new Macbook Pro 15 inch, upgraded as best I could of course.
About 2 days after obtaining my new computer, I also came down with a terrible case of the flu, then stomach flu. This had been circulating around D12 for awhile now. Gross.
Both of these things actually led to a change in direction paired with some guidance from Sven. In the middle of this all I met with him to discuss the next steps for my project and started to tell him about some of the research I had done into face recognition with the borrowed computers and science fiction I was reading. His response was for me to push further and move beyond using the face as just a trigger, but to really understand what an interaction was between my technology and the user. In order to fully explore this he suggested I do a series of mini-experiments around facial recogntion and tracking.
I always like to understand the history of something before realy getting into it, so I decided to do some research into Facial tracking. What I found was that it was a subset of a much larger libary called Open Computer Vision (OpenCV).
Open source computer vision and machine learning software was built to provide a common infrastructure for CV applications and accelerate use of machine perception in commercial products. Natively in C++.
It has more than 2500 optimized algorithms to do things like:
- Detect detect and recognize faces
- Identify objects & track moving objects
- Classify human actions in videos
- Extract 3D models of object & produce 3D point clouds
- Follow eye movements
- Recognize/establish markers for augmented reality
After learning more about this background, i was fascinated by some of the things we can do with this “computer vision”.
I was mostly interested in this through the metaphor of what “computers see” as the monitor us. There are so many surveillance cameras (I counted 24 on my way home from the subway 5 blocks away) watching us, it begs the question what exactly are they seeing? Also, what can be extrapolated by processing this information through computer vision since you can use still images, recorded video, and live video streams as input.
As I was a bit laid up at home being sick, I decided to get a couple of things working right so I could explore this moving forward.
- Get my kinect working as a web camera (didn’t feel like buying a new one until I new what was needed). Also the kinect has some extra fun sensors in it that can be used.
- Get everything working in OpenFrameworks. I’m not really much of an expert at processing when it comes to other added features, so I wanted to get my project fully running in OpenFrameworks so I could use all of the additional libraries.
Breakthrough 1 OpenFrameworks Worked!
After hours of playing around with the FaceTracking examples, I realized that there were a couple of missing headers, and some important model objects to be copied over to get any of the examples to run (as my debug line was telling me, but it took me a bit to pay attention). Finally I was able to get 4 of the 6 examples working and even all of the examples from the Face Substitution library. Woohoo!
Breakthrough 2 Kinect is Easy to Setup
In both Processing and OpenFrameworks I had little trouble getting my Kinect to work (which was a huge relief after my fiasco playing with it this Fall). I immediately noticed through many of the examples that the Kinect could shoot lots of interesting image around depth perception, and using OpenCV the image could be manipulated in really cool ways! This led to me shooting a ton of “surveillance footage” from my window when sick then later at school when better.
Exploration 1: Computer Vision
This idea of surveilling people, paired very interestingly with some reading I was doing for Urban Interaction Design, Jane Jacob’s The Life and Death of American Cities. She was talking about the phenomenon of “street surveillance” by ordinary people who would bestow a level of civility on the street by their presence of being there. This made me question how this is different from knowing we are surveilled by cameras (you would think with people behind them) in our day to day. Which type of surveillance was more effective, that of people or that of cameras? In order to explore this, my first mini-project.
Computer Vision is a video project shot with Kinect that explores how our computers see us in a world where we are almost always in view of one or more cameras – now even in our own homes.
Exploration 2: EmotiControl
One of the OpenFrameworks examples centered around being able to capture users facial expressions and save them to be recalled later as the user cycled through emotions. In this way one could save a “happy” picture of the users vs. a “sad” picture and see how likely it was that they showed this emotion later on.
I was super fascinated by this application given the writing “Dead Space for the Unexpected” I mentioned in my last post. Here a company could monitor employees emotions during meetings and use it to help measure how effectively they performed. One application of emotion recognition technology is absolutely to be able to know when users are sad/happy/frustrated/engaged/confused etc. but how will this technology be applied? I’ve already seen examples in advertising scenarios, but I imagine that it could easily be used to judge people’s performance on things like job interviews, meetings, webinars, presentations etc. What will we do with this additional information? How can we use it and what applications will it have? How will these applications benefit us or come to infringe upon our every interaction?
This also has a huge application right now given the idea of Emotional Intelligence (EI), the ability to identify, assess, and control the emotions of oneself, of others, and of groups.
Many companies have been undertaking explorations as to how this somewhat intangible aspect of ones intellectual/social/emotional makeup predicts success in the workplace. From preliminary stuies used to test it (usually videos and surveys) they have found people with a high EI or EQ are very likely to succeed at work and drive the company forward.
Given our advances in emotion recognition, how important could our ability to master our own emotions and to work with others emotions become in the future? If you could have augmented information about how others were feeling, what would you do with it in performance and workplace settings?
EmotiControl seeks to explore this question by judging how happy, scared, afraid, and surprised users were by monitoring emotional reactions to a series of videos: happy, disgusting, sad. By watching the users faces during these videos, it is possible to see how their emotions change minutely. How could we extrapolate this information further as an indication of say their psychological state or empathy for others? How could these readings be used to make inferences and classifications about the users?
Exploration 3: Keeping Up Appearances
As I mentioned in my last post about this project, one thing that coming up again and again for me was when emotion recognition was taken beyond interaction and data collection to a form of control. By knowing how people are feeling and being able to quantify that information, acceptable parameters can be set for displayed emotion.
Using a similar principle as to my first exploration about interactions triggered by smiles, I wanted to flip this idea on its head and do a project about what happens when computers can tell we aren’t smiling enough?
Keeping Up Appearances is an interactive installation using face-tracking technology to explore the dystopian consequences of emotion/behavior monitoring.
Conclusion & Next Steps
All three of these projects led me to question several things about facial and emotion recognition. Given that computer vision allows anyone to extract information from live and recorded video, what does that mean for the future of surveillance and interaction in public spaces? If things such as prescence, identity, gestures, and emotions can be classified and inferences extrapolated from the information, what kind of systemic reactions will their be in response?
I am very excited about the great potential applications around facial/emotion recognition technology for things like emotion training, beautiful and meaningful interactions, identification and security, however I’m also concerned about their potential for control.
What will it mean when everything around us has embedded sensors that can track our presence, emotional state, identity etc. and link it back to our online presence that we leave? All of this collected information can be used to make beautiful personal interactions that are tailored to us (advertising, suggestions etc.) but it can also be used to control us by changing the way we behave in public.
We had to give a quick summation of our next project during presentation and I proposed to take my “Smile Alarm” project Keeping Up Appearances further and making it a bit more complex and interactive.
I seek to answer the questions:
- What are the consequences of increasingly ubiquitous surveillance combined with emotion recognition?
- How can systems modify behavior, instill fear, and change social patterns through feedback based on measurements of groups emotions?
In order to do that, I propose to further refine Keeping Up Appearances
into an interactive installation using cameras, face-tracking technology, and emotion recognition algorithms to explore the dystopian consequences of ubiquitous surveillance and technology in our everyday lives.
Initial feedback from presentation is that I should keep experimenting, so I will see how I feel about that. I really want a finished product demonstrating my research, so I’m also tasking myself with better documentation moving forward with the new camera and video camera I will get this weekend. 🙂
Here is my final presentation for this project.