Why Do We Need size_t?

In statically typed programming languages, datatypes play a very important role. Some of common datatypes are int, float, char, double, etc. Different datatypes have different functionalities and we use them depending on our requirements. In modern C++ code, the type “size_t” is used a lot of instead of int or unsigned int. It appears in many different scenarios like parameters to string functions, standard template library (STL), etc. Ever wondered why we need to use it in the first place? Does it have any real advantage?

First of all, what is size_t?

In C++, size_t is an unsigned integer type that is the result of the “sizeof” operator. The good thing about size_t is that we can be certain that it is big enough to contain the size of the biggest object our system can handle. For example, it can be a static array of 4 GB. Technically, it can be smaller than, equal to, or larger than an “unsigned int”. This allows our compiler to take the necessary steps for optimization purposes.

Wait a minute, how can it be different from the unsigned int type? The size_t type is the type returned by the “sizeof” operator. This, in our case, happens to be unsigned int. It is an unsigned integer that can express the size of any memory range supported on the our machine. It may as well be unsigned long or unsigned long long. When writing C++ code, it’s good to use size_t whenever we are dealing with memory ranges.

A bit more about size_t

One thing to note is that size_t is never negative. Also, it maximizes performance because of it’s very definition. Now what does that mean? It means that we are type-defining it to be the unsigned integer type that’s big enough to represent the size of the largest possible object on the target platform. Hence, the we are allowing the compiler to make optimizations based on the situation. Since size_t is unsigned, we can store numbers that are roughly twice as big as in the corresponding signed type. Just like it’s the case with int vs unsigned int. The sign bit can be used to represent magnitude, and hence the value of biggest number doubles.

Is it better than using other datatypes?

Let’s talk about “int” for a minute. The int datatype is defined as the size of the integer value that our machine can use so as to efficiently perform integer arithmetic. We should use the int type only when we care about efficiency. The reason for this is that the real precision depends on the compiler options as well as the whole architecture of the machine.

At this point, you might be wondering as to why we can’t just use “unsigned int” and call it a day, right? The problem is that it may not be able to hold big enough numbers. Let’s consider a system where the size of “unsigned int” is 32 bits. Hence, the biggest number that can be represented is 4294967295 (2³² – 1). Some processors can actually copy objects larger than 4294967295 bytes.

Okay, so why not just use an “unsigned long int”?

That should solve the problem, right? But the thing about using “unsigned long int” is that we now have to deal with performance load. As you move to unsigned long, the performance starts to degrade on some systems. As we know, the standard C protocol says that a “long” should occupy at least 32 bits. On most platforms, running a 32-bit operator requires two or more instructions. The reason for this is because they work with the 32 bits in two chunks of 16 bits each. So as we can see, moving a 32-bit long datatype requires two machine instructions.

We can avoid all these performance issues by using size_t. When we use the type “size_t”, we are actually using a typedef that’s an alias for some unsigned integer type. That unsigned integer type can unsigned int, unsigned long, or unsigned long long. When we use this, then the standard C implementation is free to choose the unsigned integer that’s big enough for our needs, but not bigger than what’s needed, to represent the size of the largest possible object on our platform.

———————————————————————————————————