As web developers, we hear it all the time: storing user passwords is an aspect of development that needs to be taken seriously. It can be easy to believe your less than stunningly popular app or site need not to worry much about keeping sensitive user data safe, what with even big players on the web being caught out with sub-par password storing measures on a far too regular basis, but it really isn't that hard a task to bring your password storing measures up to scratch, particularly when the difference between making the right and wrong choice is so impressively huge.
In a previous article, I touched on the concept of logical password salting in PHP, which also covered the general idea of using unique salts in password hash generation, and on using a suitable hashing capability, like bcrypt. That article offers some practical measures you can take to enhance password storage in your code, while in this article I will instead focus on why exactly any old hashing algorithm won't do, and why choosing only suitable types of hashes for password storage is important.
Imagine the following scenario: your server has been hacked, and you fear your data has been stolen. How it happened is really not important now. Maybe you left an old version of a popular framework or tool web accessible, maybe you were targeted by a zero day flaw, or maybe you're on a shared host and were unlucky enough to be collateral damage. It also doesn't really matter that your data still seems intact, as obviously any intruder likely just took a copy and went on his way. At this point, knowing you were successfully breached is half the battle. With this knowledge, you can now focus on nullifying the usefulness of any sensitive data stolen, such as enforcing password changes on users, and inform your user base of the breach so they can think about where else they may have used the password (notifying users/customers could actually be required by law, depending on where in the world you're hosting the data or offering the service). Of course, no company or service owner would ever look forward to owning up to such a breach publicly, but wouldn't it be nice to be able to tell users that their passwords were at least stored using a world class hashing algorithm?
Let's pretend that this hypothetical breach scenario happened on a newly launched app with a tight release deadline; so tight, the developers didn't think they had time to investigate best practices for storing passwords, and instead went with the trusty MD5 algorithm. A hash is a hash, right? Possibly the most commonly known hashing technique, MD5 certainly has its practical uses: it can be used to verify the integrity of data transferred between two remote points, it can be used to normalize data into a reproducible and (almost always) unique range of characters, and various other applications along those lines. However, it should not be used to hash passwords.
To illustrate my point of why MD5 isn't suitable for password hashing, I will use a piece of software called oclhashcat. Touted as the world's fastest password cracker, oclhashcat is a powerful but unassuming command line tool that utilises the grunt of your GPU(s) to crack passwords, using either brute force, dictionaries you feed it, and various other techniques. It turns out that GPUs are quite the hash generating speed demons, particularly those manufactured by AMD, and even fairly modest GPUs can put otherwise very powerful and expensive CPUs to shame when it comes to executing hashing and encryption algorithms.
In this first example, I will be using a reasonably modern AMD Radeon R9 270X video card (which I picked up recently in an attempt to rekindle the PC gaming flame), and the following MD5 hash:
Keen readers may even recognise it: it's the MD5 for the string 'password' (without the quotes). For the purposes of this article, it doesn't really matter that the password is a glaringly weak one due to its English meaning, as I will be demonstrating the capability of oclhashcat + GPU using its brute force mechanism (meaning 'password' is effectively no weaker than any 8 character lower case English letter string). Of course in the real world, such a crummy password would be very quickly cracked using a rainbow table or directory attack technique, but let's pretend the hacker's first port of call for cracking the passwords after obtaining the data was brute force.
So, how strong is this MD5 hash? Well, it's not a pretty picture. Running the command
oclHashcat64.exe -m0 -a3 5f4dcc3b5aa765d61d8327deb882cf99 ?l?l?l?l?l?l?l?l on my mid range consumer video card cracked the hash in 7 seconds. Yes, you read correctly, seven seconds. It chewed through 5366.1 million MD5 combinations per second.
Granted, 'password' isn't just a terrible password for the obvious reason, it is also quite poor due to the fact it is so limited in variation: there are no upper case letters, no numbers, no symbols. Such variation has a massive effect on the strength of a password, because it simply means the amount of combinations possible when brute forcing start to grow. For instance, if you run the same command as above, but change the ?l to ?a in the mask (which tells oclhashcat to use lower and upper letters, digits and symbols), then the expected amount of time to crack the hash is 14 days. Increase that to 12 characters instead of 8, and it balloons to over 10 years. This is obviously a phenomenal difference, and goes to show that even a poor hashing algorithm like MD5 can secure a password against brute force if the password is good. Of course, not every user will appreciate this, so most passwords will inevitably be much weaker, and considering it is quite accessible to build multi GPU computers these days or even rent cloud super computing power, what is considered an unreasonable amount of time on my lowly mid range AMD GPU may not necessarily be the same for a well resourced hacker.
So the conclusion on MD5 as a password hasher isn't very good, but before I demonstrate how much a good hasher can change things, let me just emphasize that this isn't just all about MD5; there are other hashing algorithms that have also burned their candles of usefulness as a password storage hash down to a stub. Using SHA1 on my setup, 'password' was cracked in 18 seconds. SHA256? 44 seconds. SHA512? less than an hour. These allotments of time are all far too short, and therefore they make lousy hashing algorithms for password storage.
In the original example above I showed how a MD5 of the string 'password' can be cracked within 7 seconds flying at many millions of hash generations per second, but how about bcrypt? As outlined in the article I linked to above, bcrypt is considered a much more suitable method of hashing passwords, but just how much better is it in real terms? Here is the hash that I used to test against (again, it's the string 'password'), generated by PHP's bcrypt implementation:
And the command, which is similar to the original:
oclHashcat64.exe -m3200 -a3 '$2a$10$GxDGxtUviGEAdliZFiPOTehfe/mLx/TNjUcG1YJhHqqxR65kvF/2W' ?l?l?l?l?l?l?l?l
Ready for this? my AMD GPU, the very same that demolished the MD5 algorithm at 5366.1 million attempts per second, cranked out bcrypt examples against the aforementioned hash at a grand total of 94 hashes per second. Not million, not hundred thousand, not even thousand: ninety freaking four. With 208827064576 possible hash combinations of 26 lower case letters across 8 characters (i.e. 26 ^ 8) at 94 hashes per second, that means it would take me about 70 years to exhaust the whole list. Keep in mind this is only for 8 character lower case alphabetical passwords; it is literally a direct comparison with the 7 seconds MD5 example. Against the 18060255536776954753 possible combinations in the 12 character full range example that was going to take 10 years with MD5, the bcrypt hash would take the comically large amount of time in the range of over 6 billion years.
So why is bcrypt so much better? this wikipedia article is as good a place to read up as any, but basically it comes down to the fact that bcrypt is really a function that allows an iteration count (the '10' near the start of the hash) to prevent its hash outputs from being subject to easy cracking once hardware eventually increases in power. It can basically keep up its complexity over time, and it is this complexity which prevents hashes from being generating at a speed reasonable for brute force cracking. The reason this is great for password hashing is because 94 hashes per second is a perfectly fine speed to generate a single hash when a user creates an account or changes their password, but it is an unrealistic speed for nefarious purposes. Combined with unique salts per password, which bcrypt and functions like it are designed to accommodate, any breach of user password hashes would be almost entirely pointless for a hacker to posses.
If you came about this article not really appreciating why simple hashes are about as useless as clear text when it comes to password storage, I hope the reasoning is a bit clearer for you now. The reality of the situation is password safety also relies heavily on users being sensible with their password usage across platforms, and sensible with the complexity of passwords they create. It is also true that rainbow tables and dictionary attacks will always be a threat in picking off the low hanging fruit. However, by making sure your code is based on strong security principles like always choosing appropriate ways to hash stored passwords, you make sure your end of the deal is being satisfied, and there isn't much more users can ask of a service's security than that.