If you haven’t experienced Kinect yet, do yourself a favor and just go do it! For people who have just landed on earth, Kinect is a motion sensing device for Microsoft XBox 360 video game console. You can check out the video here. Instead of using buttons or controllers to play video games, this device enables you to use your body. Your movements will be captured and the video game character will move accordingly. When Kinect first came out in 2009, it took the world by storm! How is it possible to capture our body movements so accurately without using any wires? How does it recognize our gestures?
How does it see?
The innovative technology behind Kinect is an ingenious combination of hardware and software. It tracks our body with very high accuracy so that the movements are replicated on the screen. It consists of a color video camera which aids feature detection and face recognition by detecting three colors: red, blue and green. This is the common format used in most cameras. Kinect camera works at 640×480 resolution and captures 30 frames per second.
How does it know where I am standing?
The technology that is usually used in this case is Stereo Photography. We place two cameras beside each other and we capture the same image with two different perspectives. We can then see which objects are nearer and which ones are further away. You can read more about depth perception in this blog post. Does this setup remind you of something? Yes, it’s our eyes! We have two eyes because it’s not possible to accurately determine the distances of objects without capturing it from two different points.
The problem with stereo cameras is that they use regular camera technology, which means they are susceptible to bad lighting conditions. It’s always difficult to get good pictures under bad lighting. Hence to avoid this problem, they used a depth sensor. An infrared projector and a monochrome CMOS sensor work together to perceive the room in 3D regardless of the lighting conditions. This is called Time-of-flight camera and the working is similar to SONAR. The device sends infrared waves and those waves get reflected back. The device then measures the time taken by different objects, thus measuring the distance. Using this technology, Kinect can distinguish objects’ depth within 1 cm and their dimensions within 3 mm. Pretty darn good right!
Can it hear me?
Kinect also has an array of microphones. It can be used to isolate the voices of players from the noise in the room. This technology allows us to use voice control to choose options in the video game. I have talked more about speech recognition in this blog post.
How does it understand my movements?
There’s no point of having state-of-the-art hardware to capture data if you don’t know how to use it. This is where the software layer comes in. When you start up Kinect, it looks at the room and configures the playing area. It tracks 48 points on each player’s body. Now that we have all the tracking data, we need to understand the players’ movements. This is a classic case of machine learning. The Kinect developers collected a lot of data including people with different ages, heights, genders, clothing etc. They trained Kinect with all this data so that it can accurately understand what different gestures mean. I have talked more about machine learning in this blog post. With this training, Kinect is able to classify the skeletal movements of each model. For people who are interested, Kinect SDK is available for Windows and you can write your own apps in C++/CLI, C# or Visual Basic .NET. If you like Python, you can check out PyKinect. A whole bunch of startups are basing their ideas solely on Kinect!
Kinect is a Natural User Interface. It enables natural interaction with our machines as opposed to Graphical User Interface, which requires a medium like mouse or a controller to interact with the machine. It takes a lot of tech to make an interface come together so seamlessly. They used science so much and so beautifully that Kinect became a brilliant work of art!