PHP bcrypt hash a password with a logical salt

June 09, 2012

As I write this opening paragraph, yet another high profile website has seen its collection of user data, including passwords, stolen by a successful hack - in this case I'm referring to LinkedIn.com (but if you're reading this well into the future, another name can probably be substituted here as the latest big victim). Unfortunately, if you're a big enough target, being a victim of a hacker is usually more a matter of "when" than "if", but lets not allow LinkedIn to get off that easy - those passwords I mentioned? Yeah, they were stored as SHA1 hashes - without salting (let alone per password salting).

It's amazing how common this seems to be, even amongst the supposed trendy Web 2.0 brands out there. Unless I'm mistaken, LinkedIn are one of the big names associated with such cutting edge technologies as node.js, and yet their security practices are, like, so Coldfusion MX 7 (keeping the web language analogy going). I guess the reality is the commercial web often just doesn't think their sensitive user data is worth protecting.

You're different though - after all, you found this article for a reason. You're a PHP coder and you want your sites and web apps to at least pose any hacker a bit of a challenge - maybe even so much of a challenge that they move on to an easier target. In this article, I will cover a rather quick and painless way to store your passwords that is not only a hell of a lot better than what LinkedIn went with, but should will also scale into the future as hardware gets faster without having to rewrite everything. To do this, I'll be using bcrypt hashing, which can be utilised in PHP 5.3+ via the crypt() function's BLOWFISH capabilities.

Before jumping ahead with some code, I'll outline some decisions I made regarding the salting technique in this article. Firstly, not only will salting obviously be used, but what I'm covering here will use a unique salt per user/password combo. This means attacks like a hacker using a collection of known hashed values (i.e. a rainbow table) to match to your hashed values becomes impractical - unless the hacker wants to generate a complete rainbow table for every password hash in your table (hence "impractical", not "impossible").

With this said, there is a decision to be made on how the salt is generated. The popular convention here is to generate a random salt per password. This is a fine method, but it does have one small drawback - you'll have to store this salt somewhere, probably in the hash itself, as this is what PHP's crypt() function returns. If you open that link up, you'll see the first 29 characters of a BLOWFISH return from crypt() is the salt. While the presence of a unique salt per password alone is effective enough to make cracking a bcrypt hash a huge undertaking, why give the hacker anything you don't absolutely have to give them, at least so easily?

The method I'll use in this article for salt generation is a bit different - instead of a randomly generated salt, I'll be using a method of logically reproducing the salt, per user. To do this, I'll be using the commonly used unique identifier for a user - the username. This means technically we will have a unique salt per user and not per password, but in the vast majority of real world cases, this is going to be effectively the same thing.

So, I assume you are wondering why do this? Well, I'll tell you even if you weren't - because this means not only would a hacker need to steal a copy of your user table from your database, but the hacker would also need to get access to the logic in creating the salt - i.e. your app's source code. In some instances this will be no problem for a hacker who already has enough access to get your database, but not all instances will be like this - what if your web server and database server aren't on the same box? what if your database was accessed via an exploit that has nothing to do with the file system of your server? It may not always be possible for the hacker to get both your data and your code, but even if it is, what harm is there in adding another step to their process?

So let me summarise the concept here in one paragraph before we proceed to the code - by going this route, what we can do is instead of storing the entire hash which crypt() returns, we will instead be storing it minus the first 29 characters - that is, minus the salt. When we need to process a login attempt or any other event needing user based password matching, we will generate the salt in real time based on the username and pass that into crypt() as the salt. Make sense? I sure hope so, cause there is some code coming your way - first up, our salt generator:

function generateSalt($username) {
    $salt = '$2a$13$';
    $salt = $salt . md5(strtolower($username));
    return $salt;
}

If you didn't check out the link to the PHP manual page for crypt(), I'd recommend doing so now, as it will explain what is going on here (you want to read the "CRYPT_BLOWFISH" section). As you can see, we are first defining our hash type by starting the salt with "$2a$", which is then followed by "13$". The "13" here is our cost parameter, which directly influences how intensive the hashing is. As the manual says, the accepted range of values is 04 to 31 - if you use 31, the hashing process will likely be very slow and unrealistic for most purposes (unless your server has some serious CPU grunt). I chose 13 because it is a decent amount of cost without being too slow - remember, this process happens on every process which needs a user's password to be matched, and while security is important, we don't want our users thinking the server has timed out every time they login.

After preparing the salt variable, we are appending our own value to the end of it to fill it out. The manual says that after the first 7 characters of our salt which define our hash type and cost, we need 22 acceptable characters to complete the salt. As you can see above, to do this I simply MD5'd the username provided, and made the username lowercase to make sure no matter how it is provided to the function it will produce the same result (this will have implications on how you store usernames, but that's not really in scope for this article). I use this because MD5'ing guarantees acceptable characters plus is long enough to complete our salt - but any other technique which does the same would be acceptable. It doesn't matter that combined with our salt setup string the MD5 of the username may end up being longer than 29 characters - crypt() will trim it automatically. Also, it doesn't matter that MD5 is an utterly useless password hashing technique - we're only using it to generate the salt, nothing more.

Now that we have our salt generator function, lets move onto the actual password hash generator function - and this is surprisingly simple as well:

function generateHash($salt, $password) {
    $hash = crypt($password, $salt);
    $hash = substr($hash, 29);
    return $hash;
}

The first thing we do here is generate the entire hash which the provided salt defines as being of type CRYPT_BLOWFISH (if it's a salt value returned by the generateSalt() function earlier), and then we remove the first 29 characters (the salt) from the hash, and return what's left of the hash - the actual hashed password part, which is the value that is safe to store in the database.

Combining these two functions, here is how a login check might look on a login form submit:

$salt = generateSalt($_POST['userName']);
$providedPassword = generateHash($salt, $_POST['password']);
$storedPassword = "Your DB logic to obtain the stored hashed password for the attempted username goes here";
if ($providedPassword == $storedPassword) {
    return "success";
} else {
    return "failure";
}

See how instead of getting the salt from the database, it is regenerated? a logical reproducible salt rather than a random salt means we can do this, and it is still unique per user/password combo. If a hacker got your database, they'd still need to have access to your generateSalt() function to know what to do with it. If they got a hold of your generateSalt() code, then you're no worse off than you would have been if you stored the salt in the database, which is what you'd be doing if you stored crypt()'s return unmodified.

So what about when you want to actually store a password, say during a user registration process? This is even easier than a login check, password wise:

$salt = generateSalt($_POST['userName']);
$password = generateHash($salt, $_POST['password']);
//the password variable can now be stored in the database as this user's password

For example, if a user had the username testuser and a password of test, the database password field would have a value of x/5WIA.OjHrE6xHjXKzSD7pkZl2SWdW, assuming 13 is used as the salt's cost value like in the example provided (changing this does change the hash, of course). In this stored password, nowhere do you see the salt - presumably you'd see the username as being associated with this password hash, but without your code the hacker isn't sure how this hash was salted.