We encounter digital images everyday. We see a lot of JPEG files on our computers, cameras, phones and tablets. The actual images are huge and it should actually take up a lot of space to fit in all that data. But somehow our machines are able to compress all those images and store everything compactly. Ever wondered how it’s possible to fit so many images in such small space? How can the JPEG algorithm achieve so much reduction in size without visibly losing the image quality?
Come on! How much space can it possibly save?
Let’s make a typical estimate here. Consider a 12MP camera. It means is that it contains 12 million pixels. A typical color image has 3 channels. Each location in these channels needs 8 bits of memory. Since there are 3 channels, each pixel needs 24 bits of memory. Since we have 12 million pixels, we will need around 36 MB of memory space to store a single image. Whoa! Imagine a world where a single image consumes that kind of space. This is why we need image compression.
Alright, you made your point. What’s JPEG anyway?
JPEG stands for Joint Photographic Experts Group and it’s the most popular image file format in the world. Whenever we capture images from a camera, the camera sensors capture the raw data. This means that it just stores everything as it is. The amount of data captured depends on the resolution of your camera. A 5MP camera captures lesser data than a 16MP camera. So once we have this data, the software on the device compresses it.
How is it possible to compress images? Will we not lose important data?
There are many formats in which you can represent an image, but the one most suitable for compression is a 3-channel image with the first channel storing the brightness information and the remaining two storing the color details. It’s like putting three planes on top of each other, where each channel is a plane. Each pixel is represented by the three values, one from each plane, in that particular location. The human eye is less sensitive to changes in color than it is to changes in brightness. This is taken advantage of in this situation. The color planes have more information than required. Hence they are subsampled by a factor of 2, which means the resolution of those planes is reduced.
The image is then split into small 8×8 blocks, and each block undergoes a transformation. This transformation, called the Discrete Cosine Transform, produces a spatial frequency spectrum. Now what on earth is that? It means you are representing the same thing in a different way, which makes it easier to process. For example, instead of representing 15 by 4+11, you are making it 7+8. This is where the real magic happens. Human visual system is much more sensitive to small variations in brightness than to the high frequency variations. A planar region is a low frequency region and a region where there are lot edges and textures is a high frequency region. The human eye can easily see if there are blotches on a plane surface, but it’s difficult to spot something similar on a highly textured surface.
Okay so how do you actually compress an image?
When you transform something, you take the inverse transform to get back the original thing. Now we used the earlier transformation to separate the different frequency regions. If we store everything, then we can reproduce the whole image exactly as it was. But if we reconstructed the image with just the low-frequency components of the transform, we will not be able to tell the difference because our eye will not know that the high-frequency components are missing. It means that we can store very less data and still reproduce the same image because human eye is not sensitive to a few things. You will be surprised to know that this step discards around 90% of the data captured by camera sensor. That’s a lot of data right! It is redundant as far as the human eye is concerned. After this, we do another round of redundancy check and squeeze out all the unwanted data. The technique used here is called Adaptive Huffman coding and it’s a lossless data compression scheme. We will save the discussion about this for another blog post. JPEG is a lossy compression scheme, which means that we are discarding a lot of data by tricking the human eye. But once the data is compressed, we need to store and transmit it. We cannot afford to lose any data here. That’s the reason we use Adaptive Huffman scheme, which is lossless. This compressed data is then finally stored as a JPEG image, which we are all familiar with.
There are other image compression algorithms as well, but they all take advantage of the human visual system in a similar way. The amount of compression depends on how much you are willing to tolerate. The compression algorithms allow you to control the amount of distortion. If you want to make a big poster, then you are better off storing more data (less compression). But if you want to store images on your computer, then you can compress more.