If you are following tech news, you might very well be aware of the fact that there was a security leak in Linkedin recently. A Russian hacker leaked around 6.5 million Linkedin passwords along with 1.5 million passwords from a dating website (possibly eHarmony) to get the total to around 8 million. If you haven’t changed your Linkedin password yet, it’s about time! Now how on earth could this have happened? What exactly went wrong?
Is it that simple?
Not exactly. Whenever a user creates an account and enters the password, it’s not stored as it is. It’s too much of a risk if someone were to get their hands on this. What we need is something we can store so that even if someone gets it, they should not be able to do anything with it. So the password is taken and it’s scrambled using a cryptographic hash function. The output is a fixed-length sequence of bits. It’s technically not possible to know what the password is based on this sequence. This sequence of bits is stored instead of the password. So when the websites want to authenticate you, you will enter your password and they will apply that transformation to your password. If the output is the same as the sequence of bits stored, then you’re in.
Password scrambler: Hashed with a pinch of salt
A cryptographic hash function is actually a function which takes in arbitrary data and returns fixed-size sequence of bits. The main property that’s being used here is that it’s technically not possible to generate the original data if you are given this sequence of bits. You will not even know how long the data was because the output is always fixed length. For example, let’s say you have a number “34”. Now you want to hide it by applying a hash function. So you add “51” to it and store “85”. Now, if a hacker sees “85”, he/she will never be able to know what the original number was, unless you know more details about the hashing function. It can be any combination (80+5, 19+66, 50+35 etc). In real life, this will be a much more complex function and the output will be a very big sequence. And usually the websites limit the number of times anyone can attempt to enter a wrong password.
To make it more robust, the original password is added with some random sequence of bits and then the hash function is applied. This way, even if you somehow manage to crack the hashing function, you will never know what the original data was because it has been mixed with some random data. This random data is called “salt”. If the salt is large enough, then a dictionary attack would be impractical. A dictionary attack is different from brute force in the sense that only those passwords are tried which are more likely to succeed. It’s like an intelligent brute force attack. When you enter your password in any website, it is converted to this salted cryptographic hash and then stored.
Ok so what happened with Linkedin?
Linkedin uses something called SHA-1 cryptographic hash function to generate these hashes. SHA stands for Secure Hash Algorithm. We’ll reserve discussing hash functions for another blog post. I just wanted to point out that it’s been the standard for quite some time now. It keeps improving with time and the variants keep coming out. Now the 6.5 million leaked Linkedin passwords don’t use cryptographic salt, which makes it much easier for the hacker to crack the passwords. The other 1.5 million passwords use MD5 hashes and they are unsalted too. Why on earth would they not use salt to store the passwords? Well, your guess is as good as mine in this case.
Is it really encryption?
Purists will argue that this is technically not “encryption” per se, and they are right. This is not exactly encryption. This is a one-way function designed to make the system more robust. This is the reason I used the word “scrambled” instead of “encrypted” earlier in this post. What it means is that if you encrypt something, you will be able to get back the original data if you know the encryption scheme. A cryptographic hash function, on the other hand, will not give you the original data back. There are few other details to support this argument, but you get the gist of it.
Linkedin is currently working with law enforcement to investigate further in this regard and how to safeguard everything. It had been quite some time since we saw security leak on such a big scale. Hopefully they will get everything back on track soon and tighten up their security.
————————————————————————————————-
Interesting read, wanted to mention about how exactly attackers(in general) are able to guess the passwords from the Hash values. They use something called as Rainbow tables which are nothing but pre-computed hashes(more can be found on wikipedia). There are tools such as Cain and Abel (currently they can crack the windows passwords). I am assuming the attackers used something similar.
Thanks for the feedback. I wanted to include Rainbow Tables along with password crackers (like Cain and Abel, John the Ripper etc), but I was afraid I would have to delve deep into the realm of password-cracking. I would have had to discuss this in detail if I mention this because people who are not familiar with the topic might find it a bit out of place. But when I write a post on password-cracking, I will definitely discuss more about this.