Augmented Reality (AR) has been one of the most exciting fields to have come into prominence in the last few years. Back when people starting working aggressively on computer graphics, great innovations took place. Today, we have 3D movies with high end computer graphics, but it is still on the screen inside our machines. People then started to think how to pull the graphics out of the screen and integrate them into real world. The result of this effort was augmented reality. It tries to blur the line between what’s real and what’s virtual. It enhances our perception of reality. You can take a look at this video to see what I’m talking about. How does this technology work? How does it track the marker?
Why do we need augmented reality?
Augmented reality basically changes the way we view the world. One of the most famous examples of this technology would be Google Glass. I have talked more about it here. With the help of this technology, informative graphics will appear in your field of view. This will be in sync with the movement of your head. Similar applications already exist on smartphones. You can try out Layar or Wikitude. These apps will overlay information on top of the real world. You can just point your phone towards something and the info will pop up.
Is augmented reality just about overlaying text on top of an image?
Most of the so-called AR apps overlay text on top of the real world. Purists will argue that this is not true AR. Those apps don’t actually “see” the world. They just use GPS information and display information based on that. The one you saw in the video earlier is more towards what I am talking about. When you point your device towards something, it should genuinely see and understand what’s there. After that, it should overlay graphics on top of it. This is called Vision-based AR. The computer generated graphics will behave like a real object and when you move around, you should be able to see different sides of that object.
Why was that lady in the video wearing a marker on her forehead?
The black and white marker on her forehead is called a fiduciary marker. A fiduciary marker is an object used in the field of view of an imaging system as a point of reference or a measure. Fiduciary markers are often manually attached to objects in a scene so that the objects can be recognized. For example, to track some object, a light-emitting diode can be attached to it. In AR systems, predefined patterns are used so that they can be tracked. Predefined patterns are used for training and the algorithms recognize these patterns in real world and replace them with the desired graphics. The advantage of this approach is that we can build very robust AR systems because of the predefined markers but the disadvantage is that the markers are fixed and we need to train it beforehand.
How do we detect those markers?
Let’s do a little deep-dive here. The reason we use fiducial markers is because it’s easy to recognize them. The black and white markers have high contrast, which makes it easy for the computer vision algorithms to quickly find the location of the marker. Once the frame is captured, it is processed and made ready for feature extraction. The algorithm looks for corners of the square pattern. The algorithm to find corners is pretty robust, and hence we rely on it. Once it finds the pattern, the next step would be to find out where we are standing. This is called pose estimation. Once this is done, we know where to place the graphical object and what should be the perspective.
Can we do it without markers?
This is the holy grail of computer vision! Most of the AR research is going towards markerless technology. Generic object detection and tracking is a hard problem and researchers have been working on it for decades. The reason for this is that there are so many variations of the same object. Since this has to run in real time, we cannot rely on algorithms which are heavy on computation. Some of the approaches in this domain are:
Wavelet based: Wavelet analysis is used to separate moving objects from the background. Wavelet analysis decomposes a signal into time-frequency space simultaneously.
Graph cuts: Objects modeled using Spatial Color Gaussian Mixture Models.
SIFT: Detection using Hough transform followed by tracking using SIFT. This is a powerful feature but the problem is that this takes a lot of time and cannot run in real time.
Probability distribution matrices: Tracking using displacement of a point as a probability distribution over a matrix of possible displacements.
A lot of companies are making strong efforts towards standardizing this technology. These efforts are spearheaded by Qualcomm, Sony, Total Immersion, Metaio and a bunch of other startups. There are toolkits available for developers to develop algorithms and apps for augmented reality. Only time can tell if this technology will mature enough to affect our day-to-day lives!