People in computer vision and graphics deal with homogeneous coordinates on a very regular basis. They are actually a nice extension of standard three dimensional vectors and allow us to simplify various transforms and their computations. When I say “transformations”, I am talking about all those special effects on the screen, and the corresponding movements and scaling of various objects. But why do we need homogeneous coordinates to do all that? Why can’t we just move the objects around? Well, we can’t directly do that, not easily anyway! This will become clear soon. The concept of homogeneous coordinates is fundamental when we talk about cameras. In order to design our algorithms, we need to understand how the cameras are looking at the real world. This is in fact utilized heavily by game programmers as well. So what is it all about? Why is it so important?
Why do we need it?
One of the many purposes of using homogeneous coordinates is to capture the concept of infinity. The coordinate system we use to denote the location of an object is called Euclidean coordinate system. If we are considering a 3-dimensional space, it’s just a nice triplet of numbers! In this system, infinity is something that does not exist. I have covered it in detail here. Mathematicians have discovered that many geometric concepts and computations can be greatly simplified if the concept of infinity is used. But the constraint is that we cannot treat infinity like a regular number. If we don’t use homogeneous coordinates, it would be difficult to design certain classes of very useful curves and surfaces. These curves and surfaces are very crucial in developing algorithms in computer vision, graphics, CAD, etc.
Projective geometry relies heavily on homogeneous coordinates as well. Now what is this “projective geometry”? Well, it is a very important field which deals with projective transformations. That wasn’t very helpful, was it? You might be wondering what these “projective transformations” are! Basically, when you project something on to a surface, it appears in a certain way depending on how you are holding that surface relative to the object and the light source. A good way to imagine this would be to think about cameras and the various ways in which you can capture an image. If you keep a book on a table and capture an image by holding the camera right above it, the book will appear as it is in the image. If you hold the camera at an angle, the book will not appear rectangular. We need a way to account for this kind of transformation. So in order to have a nice and clean mathematical design, mathematicians came up with homogeneous coordinates.
Why do we need projective geometry?
Before we proceed further in our discussion about homogeneous coordinates, let’s talk about projective geometry a little bit. We need to know what it is before jumping into its need for existence. It is basically the study of geometric properties that are invariant under projective transformations. We just talked about projective transformations, so you should have an idea about it by now. So projective geometry deals with properties that don’t change even if they undergo projective transformations.
We are all familiar with Euclidean geometry and with the fact that it describes our three-dimensional world so well. In Euclidean geometry, the sides of objects have lengths, intersecting lines determine angles between them, and two lines are said to be parallel if they lie in the same plane and never meet. This is all nice and intuitive! Moreover, these properties do not change when the Euclidean transformations (translation and rotation) are applied. However, when we consider the imaging process of a camera, it becomes clear that Euclidean geometry is insufficient. Lengths and angles are no longer preserved, and parallel lines may intersect. It all depends on how you are holding the camera when you capture the image! In our earlier example about the book and the table, you can see that the angles were not preserved. The rectangular shape doesn’t appear rectangular anymore in the captured image.
So some properties are not preserved! Why should I care?
Projective geometry models the imaging process of a camera nicely because it allows a much larger class of transformations. It is much more than just translations and rotations. It is a class which includes perspective projections, but the drawback is that it preserves fewer measures. As in, the lengths, angles, or parallelism can change when you capture an image. For example, if you hold your camera at an angle, the length of a line can appear shorter. It is important that you understand this because this is very crucial in projective geometry, and our overall discussion here. We all know that two distinct lines intersect at exactly one point, except when they are parallel. We have a rule, and there is an exception to it. That’s the reason mathematicians came up with a generalization of Euclidean geometry. They are not big fans of exceptions to their rules! We will now go ahead embed the 2D plane in 3D space. We’ll do this by calling all lines passing through origin (0,0,0) “points”. What does that even mean? Are we just calling them whatever we want? Not exactly! It will become clear as we move along. In the next step, we’ll call all planes passing through origin “lines”. So basically, what we are doing here is increasing the dimension of primitives by one dimension. Points become lines, lines become planes, and so on. The whole 3D space is called the projective plane.
Are we ever going to talk about “homogeneous coordinates”?
Yes, of course! It just took some time to build it up. So now we know what “projective plane” is. We can think of projective plane as a Euclidean plane with additional points. These points are located at infinity. Now how can anything be located at “infinity”? It doesn’t even exist! Well, that would be true in Euclidean geometry, but we are not in that realm anymore. We are defining a new realm where infinity can be defined. So in this projective plane, there is a point at infinity for each direction. So if we are talking about a 3D space, then there are points at infinity for each of those three directions. Parallel lines in the Euclidean plane are said to intersect at a point at infinity corresponding to their common direction. Another way of saying it would be “parallel lines never meet”. In our new realm here, we never say “never”. So an optimistic way of putting it would be “parallel lines meet at infinity”.
Let’s say we have a point (x,y) on the Euclidean plane. If we choose a non-zero real number Z, then the triple (xZ, yZ, Z) is called a set of homogeneous coordinates for the point. You can look at the picture here to visualize it. We basically draw a line from that point to the origin and all the points on that line represent the homogeneous coordinates of that 2D point. By this definition, multiplying the three homogeneous coordinates by a common, non-zero factor gives a new set of homogeneous coordinates for the same point. In particular, (x, y, 1) is such a system of homogeneous coordinates for the point (x, y). For example, the Cartesian point (1,2) can be represented in homogeneous coordinates as (1,3,1) or (2,6,2). The original Cartesian coordinates are recovered by dividing the first two positions by the third. Thus unlike Cartesian coordinates, a single point can be represented by infinitely many homogeneous coordinates.
You might be wondering about the points where ‘z’ is set to 0. Well, points with coordinates k*(x,y,0) are called “vanishing points” or points at infinity. As we can see here, they do not intersect the plane z=1. Okay, so what’s so nice about them? Let’s revisit intersections of lines for a second here. If you take two projective lines, which are actually planes through the origin, they will always intersect in a line through origin which as we now know is a point. If the lines are not parallel i.e. the intersection of their planes with z=1 are not parallel, then the resulting point will also intersect the plane z=1.
If the lines are parallel, this means that they will intersect in a point (which is actually a line) that is parallel to z=1. So it’ll be a “vanishing point”. For example, let’s say we have two planes intersect on the x-axis. All vanishing points together form a line. The plane z=0 is called the “vanishing line”. In a projective plane, we can finally say that “all lines intersect in exactly one point”.
Why are lines parallel to z=1 called “vanishing points”? Remember that z=1 is just one affine view of our projective plane. We could take any other plane (not going through the origin) and intersect it with our “points” and “lines” to get another affine view. For example, if you tilt the plane z=1 a bit so that it intersects the x-axis, then you would be able to “see” the vanishing point which is the intersection of the two parallel lines.
What’s the advantage of going through all this confusion?
Well, it may seem confusing to a first time reader, but it’s an extremely useful concept. Homogeneous coordinates are used extensively in computer vision and graphics because they allow common operations such as translation, rotation, scaling and perspective projection to be implemented as matrix operations.
Let’s consider perspective projection. We know that a position in space is associated with the line from it to a fixed point called the center of projection. In the figure above, it’s the origin. This point is then mapped to a plane by finding the point of intersection of that plane and the line. The plane is z=1 in our case, as seen from the figure. The interesting thing to note here is that this is how a three-dimensional object would appear to the eye. So by doing this, we are getting an accurate representation of how that object would appear to us. Once we get this model, we take it and apply it to a camera. After all, we want the camera to simulate the human visual system as closely as possible.
To keep the discussion simple, let’s say the center of projection is the origin and points are mapped to the plane z = 1. Let’s consider any random point in space denoted by (x, y, z). If we draw a line from the origin to this point, then it would intersect the the plane at (x/z, y/z, 1). Since we know that we are talking about the plane z=1, we can just write (x/z, y/z) to indicate the point on that plane. In homogeneous coordinates, the point (x, y, z) is represented by (xw, yw, zw, w), where w is a non-zero real number. The point it maps to on the plane is represented by (xw, yw, zw), so projection can be represented in matrix form as:
1 0 0 0 0 1 0 0 0 0 1 0
This is a matrix that can represent various geometric transformations depending on how you choose to fill up its values. As a result, any perspective projection of space can be represented as a single matrix. Isn’t it beautiful? You can just modify the values in this simple matrix and multiply it with your point to get whatever you want.
What does it all mean?
In the real world, we deal with 3D coordinates. So homogeneous 3D coordinates are represented by 4D vectors, where the fourth coordinate is a non-zero number. This is just a generalization of the things we have discussed so far. Let’s go through a quick summary. Two vectors (x,y,z,w) and (x’,y’,z’,w’) represent the same point in 3D space if one is a multiple of the other. If the fourth coordinate is 1, then it is called the normalized form of the homogeneous coordinates. Points of the form (x,y,z,0) are again called “vanishing points”. Homogeneous coordinates are basically used in the field of projective geometry which generalizes affine geometry.