Most people I’ve seen online compute a simple hash of password + salt for persistence and authentication. This is the accepted standard in a straight-forward solution:

    byte[] Hash(string password)
    {
        byte[] pass = System.Text.Encoding.UTF8.GetBytes(password);
        //Create the salt to use
        byte[] salt = new byte[32];
        new RNGCryptoServiceProvider().GetBytes(salt);
        //Create the hash of password and salt
        HashAlgorithm hashAlgo = new SHA256Managed();
        hashAlgo.TransformBlock(salt, 0, salt.Length, salt, 0);
        hashAlgo.TransformFinalBlock(pass, 0, pass.Length);
        byte[] hash = _hashAlgo.Hash;
        hashAlgo.Initialize();
        //Copy the combined salt + hash to a single array
        byte[] result = new byte[salt.Length + hash.Length];
        Array.Copy(salt, result, salt.Length);
        Array.Copy(hash, 0, result, salt.Length, hash.Length);
        return result;
    }

The only deviation i’ve seen from this is to either use existing data (say a primary key in the database) for the salt, or to ‘hide’ the salt at some offset in the result. Both of these ideas are valid yet amount to not much more than a little obfuscation. I don’t recommend it.

Now if you REALLY want to secure your passwords and/or protect against brute force attacks there is another approach. The approach is quite common in generating crypto keys from passwords but can just as easily be used for hashing the password. The idea/concept is expressed in RFC2898 by the introduction of the iteration count. By introducing this into the password hash we increase the computational complexity required to test a password by several orders of magnitude. The straight-forward way of achieving this in the BCL is as follows:

    byte[] Hash(string password)
    {
        //Create the salt to use
        byte[] salt = new byte[20];
        new RNGCryptoServiceProvider().GetBytes(salt);
        //Combine the salt with the iteration-count
        byte[] iterationBytes = BitConverter.GetBytes(1010);
        byte[] iterationSalt = (byte[])salt.Clone();
        for (int i = 0; i < iterationSalt.Length; i++)
            iterationSalt[i] ^= iterationBytes[i % iterationBytes.Length];
        //Create the hash of password and salt
        DeriveBytes deriveBytes = new Rfc2898DeriveBytes(password, iterationSalt, 1010);
        byte[] hash = deriveBytes.GetBytes(20);
        //Copy the combined salt + hash to a single array
        byte[] result = new byte[salt.Length + hash.Length];
        Array.Copy(salt, result, salt.Length);
        Array.Copy(hash, 0, result, salt.Length, hash.Length);
        return result;
    }

This will work very well but will be more costly to calculate than the first algorithm. I would not want to use this in a system that verifies passwords on every request (i.e. Basic Auth over SSL); however, for token-auth systems this will prove much more durable to brute force attacks. By combining the iteration count to the salt (or key) you make the attacker repeat the entire iteration sequence for each iteration count attempted. This makes it significantly more difficult for them to steal the password hashs and crack the passwords without also stealing the software or otherwise knowing the iteration count. Even knowing the iteration count it's now much more expensive to compute each attempt at matching an entry in their password dictionary.

Recently I've added the PasswordHash class to wrap up some of this behavior. Note that this class does not combine the iteration count with the salt, yet still provides significant security benefits over a simple hash.

 

So recently I’ve been working heavily with some of the cryptography stuff in .Net and adding a large amount of it to my open source library. One of the many things I needed to perform was simply encrypting and decrypting a piece of data with a password.

It seems everyone out there is using Rfc2898DeriveBytes by now and not using the older PasswordDeriveBytes. So it’s use is simple enough, construct it with a password, ask it for a number of bytes, and off you go. When things get interesting is correctly providing salt to the derived bytes algorithm. I didn’t want to deal with this all over my code and thus built the PasswordKey class. It auto-magically prepends the salt to the stream when encrypting data. When decrypting data it reads the salt, generates the key, and then decrypts the data.

At first everything went well, it seemed to work and, as per my norm, I quickly got my unit tests pounding on the implementation. Somewhere along the way though I must have used the Rfc2898DeriveBytes class in a way that was not expected as when I began running in release (optimized) mode I started seeing strange results. The tests were generating 32-bit keys with the same password, iteration, and salt combinations and were generating different keys? The funnest part was they only differed after the first 20 bytes. How could this be? If the input was wrong there is almost no likelihood the first 20 bytes would be the same, so how did I fail? I looked and looked, googled and googled some more. Read nearly every line of code in the class using Reflector.Net, nothing. Still to this day I don’t know why it’s breaking, I do know I can get two identical 20-byte keys, or two identical 40-byte keys, but I still get two different 32-byte keys… What should I do? Well I came up with two possible solutions:

1. The first ‘fix’ for this issue resulted in the PBKDF2 class that derives from Rfc2898DeriveBytes. Then by overriding the GetBytes() method I can always pass a value of 20 to the actual GetBytes() base method:

    public override byte[] GetBytes(int cb)
    {
        byte[] buffer = new byte[cb];
        for (int i = 0; i < cb; i += 20)
        {
            int step = Math.Min(20, cb - i);
            Array.Copy(base.GetBytes(20), 0, buffer, i, step);
        }
        return buffer;
    }

2. Then I started debating the value of using a 20-byte hash to generate 32 bytes of data. Seemed like this algorithm was just as applicable with any size hash right? Sure enough… So the second, and finally preferred solution was to recreate the algorithm with a variable hash algorithm. This gave birth to the HashDerivedBytes<THash> implementation which accepts THash to be any HMAC (Keyed hash) derived implementation. With this I’m able to force the use of the managed SHA256 hash routine and elevate the iterations by nearly an order of magnitude without substantial performance loss. In the end I’m pleased with the implementation and can verify it’s compliance with the original algorithm by comparing it’s output using SHA1 with that of the original Rfc2898 and produce compatible results.

Anyway if you have problems with the Rfc2898DeriveBytes class I’d like to hear from you just to make sure I’m not crazy. And now you know of two possible solutions if you do have problems ;)

 

Changes in this version:

  • Library.Crypto namespace was added with a fairly complete cryptography API (at least what I needed) including:
  • Library.IO namespace was added to include several new stream derivations including:
    • Added Library.IO.SegmentedMemoryStream for memory streaming while avoiding LOH allocations.
    • Added Library.IO.TempFile to manage temp files and remove them when disposed.
    • Added Library.IO.ReplaceFile to transact replacing a file.
  • Library.Html namespace was added to help with manipulation of html and xhtml:
    • Added Library.Html.XhtmlValidation will use the w3c xhtml 1.0 DTDs to validate xhtml files.
    • Added Library.Html.HtmlLightDocument to a provide fast DOM parsing of HTML using regular expressions.
    • Added Library.Html.XmlLightDocument to a provide fast DOM parsing of XML using regular expressions.
 

This plugin should now be functional on 64-bit installations of TortoiseSVN, use the x64 installation from:

https://github.com/csharptest/JiraSVN/archives/master

Let me know if you have any difficulties.