Design

What is Design?
Design is the results of decisions made both before and during the coding process that effect more than one component in your product/codebase.

What Design is Not?
Needless paperwork produced solely to satisfy some middle-manager’s project milestone.

PasswordHash - how to correctly create a salted password hash

June 8th, 2010

Most people I’ve seen online compute a simple hash of password + salt for persistence and authentication. This is the accepted standard in a straight-forward solution:

    byte[] Hash(string password)
    {
        byte[] pass = System.Text.Encoding.UTF8.GetBytes(password);
        //Create the salt to use
        byte[] salt = new byte[32];
        new RNGCryptoServiceProvider().GetBytes(salt);
        //Create the hash of password and salt
        HashAlgorithm hashAlgo = new SHA256Managed();
        hashAlgo.TransformBlock(salt, 0, salt.Length, salt, 0);
        hashAlgo.TransformFinalBlock(pass, 0, pass.Length);
        byte[] hash = _hashAlgo.Hash;
        hashAlgo.Initialize();
        //Copy the combined salt + hash to a single array
        byte[] result = new byte[salt.Length + hash.Length];
        Array.Copy(salt, result, salt.Length);
        Array.Copy(hash, 0, result, salt.Length, hash.Length);
        return result;
    }

The only deviation i’ve seen from this is to either use existing data (say a primary key in the database) for the salt, or to ‘hide’ the salt at some offset in the result. Both of these ideas are valid yet amount to not much more than a little obfuscation. I don’t recommend it.

Now if you REALLY want to secure your passwords and/or protect against brute force attacks there is another approach. The approach is quite common in generating crypto keys from passwords but can just as easily be used for hashing the password. The idea/concept is expressed in RFC2898 by the introduction of the iteration count. By introducing this into the password hash we increase the computational complexity required to test a password by several orders of magnitude. The straight-forward way of achieving this in the BCL is as follows:

    byte[] Hash(string password)
    {
        //Create the salt to use
        byte[] salt = new byte[20];
        new RNGCryptoServiceProvider().GetBytes(salt);
        //Combine the salt with the iteration-count
        byte[] iterationBytes = BitConverter.GetBytes(1010);
        byte[] iterationSalt = (byte[])salt.Clone();
        for (int i = 0; i < iterationSalt.Length; i++)
            iterationSalt[i] ^= iterationBytes[i % iterationBytes.Length];
        //Create the hash of password and salt
        DeriveBytes deriveBytes = new Rfc2898DeriveBytes(password, iterationSalt, 1010);
        byte[] hash = deriveBytes.GetBytes(20);
        //Copy the combined salt + hash to a single array
        byte[] result = new byte[salt.Length + hash.Length];
        Array.Copy(salt, result, salt.Length);
        Array.Copy(hash, 0, result, salt.Length, hash.Length);
        return result;
    }

This will work very well but will be more costly to calculate than the first algorithm. I would not want to use this in a system that verifies passwords on every request (i.e. Basic Auth over SSL); however, for token-auth systems this will prove much more durable to brute force attacks. By combining the iteration count to the salt (or key) you make the attacker repeat the entire iteration sequence for each iteration count attempted. This makes it significantly more difficult for them to steal the password hashs and crack the passwords without also stealing the software or otherwise knowing the iteration count. Even knowing the iteration count it’s now much more expensive to compute each attempt at matching an entry in their password dictionary.

Recently I’ve added the PasswordHash class to wrap up some of this behavior. Note that this class does not combine the iteration count with the salt, yet still provides significant security benefits over a simple hash.

Visual Studio 2010 time to upgrade…???

April 23rd, 2010

Well just got the release version of VS2010 installed and all I can say is “OMG”.  Not an OMG as in “OMG that is so cool!”, but more like an “OMG, are you serious?”.
 
Several things are standing out as wickedly wrong before I even open a solution…
 
My hardware is no longer sufficient.  I’m running on an HP Pavilion notebook about 2yrs old. Running a dual-core 2.2ghz (3gb RAM) with Vista SP2 32-bit.  Just moving my mouse through menus and around the UI, dragging windows, etc is painful and eating 30-50% of my CPU.  Now I know it’s not stellar machine but my mouse jumping several inches at a time is crazy unusable.  Since I don’t time or money to buy new hardware right now I’ve disable the Vista Areo and reverted to the Windows 2000 look and feel.  Things are moving much better now… Almost usable if ugly.  The scrolling speed of source code is still nasty slow. So word to the wise if you scoring below 3.0 on the “Windows Experience Index” for “3D business and gaming graphics performance” your probably in for a hardware update.
 
Tabbed documents, I hate them.  Unfortunately for me Microsoft is cramming another user experience change down my throat.  Like that @#^*ing ‘Ribbon bars’ interface in Office there is no way to turn it off in VS2010.  (I actually quit using MS Office and switched to OOo to avoid ribbons).  It’s cool that you make the source window float out of the app, but I’m still going to desperately miss the good-old-fashioned MDI view.
 
I’m sure that over the comming days, weeks, and years I’ll find much more to hate and love about about VS2010, but so far I’m not a fan. I guess I’ll go take some time to get to know VS2010 a little better and see if my initial reaction changes.  I hate to think what the next GUI design change from MS will entail for me … I guess I’m just old and set in my ways.
 
LOL, at least I’m not the only one: What happened to MDI capability in the editor?

The exponential cost of doing things “Right”

February 23rd, 2010

Sometimes a picture is worth a thousand words…

I simply ‘cringe’ every time I hear a developer utter the words “I’m going to do this RIGHT!”. Why do we use the words “Right way” and “Wrong way” when describing things? The fact is that there is a lot more than just one “Wrong way” to do something, trust me on this. Further there are a vast multitude of “Right ways” to do anything in software. The simple idea that you know and grasp what is the “Perfect” way of doing something is absurd. Take your pick of a software demi-god and ask them if they ever strove to write something “Right” only to figure out by the end it wasn’t the “Right way”.

So here is a quote of mine that sums it up:

The closer you get to your perfect solution the more your idea of the perfect solution changes and thus the perfect solution cannot be obtained.

We need to recognize that the world of software is not black and white. It is instead an infinite number of shades between the two. The problem we should concern ourselves with is not how to do things “Right”, but rather ask ourselves what is “Good Enough”.

We first recognize the term “Good Enough” to be something that cannot be defined in a concrete way. The term refers to one of those gray areas between black and white and has no consistent definition. As such, for every project, for every class or method, for every line of code we have to redefine “Good Enough”. There are a number of supporting principals of software engineering like YAGNI, SOLID, TDD, etc that aid us to achieve this elusive definition.

For those that want to see a concrete example, this is “Good Enough”:

	public static void Main(string[] args)
	{
		DoSomething(args[0], args[1], int.Parse(args[2]));
	}

… good enough that is for an internal tool that will only ever be called from a single build script. I’ve done this several times on small utilities and I’d argue in those cases there is nothing wrong with it. Yet if you needed to do the same thing for a customer facing utility the same routine might need several classes and a hundred or so lines of code behind it.

In summary, next time you start writing code ask yourself “where on the exponential graph of cost vs. quality do I want to land?”. If I’m writing a routine to keep a heart-implant pumping blood inside someone’s body I’m probably going to want to put some effort into it. However, if I’m writing an argument parser for a command-line app used exclusively by our build then the opposite extreme would apply.

System.Object good idea or bad?

January 22nd, 2010

There are three virtual methods that IMHO should have never been added to System.Object…
• ToString()
• GetHashCode()
• Equals()
All of these could have been implemented as an interface. Had they done so I think we’d be much better off. So why are these a problem?

First let’s focus on ToString():
1. If ToString() is expected to be implemented by someone using ToString() and displaying the results you have an implicit contract that the compiler cannot enforce. Your code might assume that ToString() is overloaded, but there is no way to force that to be the case. Had this been an interface, say IDisplayString, your client code would be able to ensure that an object implement this.

2. Most argue the benefit of overloading ToString() is for the debugger. You should NOT be doing this, instead you should start using [System.Diagnostics.DebuggerDisplayAttribute].

3. As for needing this implementation for converting objects to strings via String.Format(), and/or Console.WriteLine, they could have deferred to the System.Convert.ToString(object) and checked for something like ‘IDisplayString’, failing over to the type’s name if not implemented.

4. Exactly what ToString() returns or should return seems to be a matter of debate and is the worst of all it’s problems. Is this a display string? If so, for what culture? Is this a serialization string? Is it just for debugging? There doesn’t seem to be any correct answer.

And don’t even get me started on GetHashCode/Equals.
1. Overloading .Equals has nothing to do with expressing equality in your code (value1 == value2) unless you remember to call it directly. Stupid!

2. There are numerous ways in which to compare two objects. Reference comparison is one, or perhaps by an object’s identity field, or even a name field? What I love most is that the reference comparison is difficult to produce on an object that overloads these. Does anyone know how to get the ‘hash’ of an object’s reference (memory location)? BTW you can, it’s not in an obvious place like Object.GetHash(object), it’s RuntimeHelpers.GetHashCode(object).

3. Casting — why lord, why? Why must you make me cast the object I’m comparing myself to. And what do I do with null or some other object type? Is there some standard I’ve never heard of that defines the appropriate action? Certainly neither equality or inequality are the correct response?

4. What comparison will be used? I’ve implemented IEquatable before, sometimes things I call even use it; however, a larger part of the time I still have to overload the object’s GetHashCode/Equals.

5. Much like ToString() there isn’t a way to tell if an object actually implements this behavior. If it does, did I want them to is an entirely different story. They could be breaking my code by implementing these without either of us knowing I was depending upon reference comparison.

So I say enough already, can we please just deprecate these already? I’d even be willing to find and fix all the places I’ve been unwillingly forced to use these damn things.

When do you Optimize your code?

November 23rd, 2009

The answer is simple… only when the profiler tells you to.

Optimizations often make code less reliable and often constrain implementations making them less flexible. Good software performance is not created by making lots of micro optimizations throughout your code. A good design from the architectural point of view is the critical key to success for performance.

Why would I say this? Why are optimizations bad? Allow me to demonstrate a few types of optimizations that should only be done after careful consideration and profiling:

  1. Lazy Loading: I hate the term as it’s not a lazy approach. Go with the lazy approach and don’t do this until it’s a problem. Writing a property to have strange side-effects on an object is just a bad idea to begin with. If the performance of loading a collection of data is so bad that you must avoid it, then use a method that does not cache the result. Make it clear to callers that the data is not cached but loading with each call. For instance if you want a collection of Students from a Teacher object call the method “FetchStudents()” rather than “GetStudents()”.
  2. Caching: WTF, so many people think this is some kind of great idea. Caching is evil it over complicates code and causes strange side-effects all over your code. Just to be clear I’m not talking about caching of computational results in a data store, I’m talking about caching results of data store requests. Don’t ever do it… ever. A noteworthy exception to the rule is dealing with read-only or write-once data.
  3. Micro Optimizations: This is hard to define but let’s review a few examples on stack overflow. See “Array more efficient than dictionary“, “Are doubles faster than floats“, or “Is if-else faster than switch“. These kinds of optimizations are just silly. Write readable, maintainable, testable code first, then analyse the code with a profile if you have a problem. I loved John Skeet’s response to the first of these questions entitled “Just how spiky is your traffic“.

 

Enough about what not to do… what should you do? Always be aware of cost of the algorithm you are choosing. Choosing the right data structure and/or algorithm at the onset of coding can result in highly predictable and scalable software. Inversely failing to do so can cause non-linear performance as the volume of data increases eventually yielding unusable software. We’ve all seen this happen in a database application with a non-indexed query. This can happen in your own C# code too so don’t be an idiot but don’t spend all your time focused on the fastest approach. Look instead for a predictable linear-growth solution that fits your needs.

Remember there are three competing interests in any piece of code. Flexibility, Reliability, and Performance. Any two of these can be achieved at a reasonable level at the sacrifice of the other, or you can excel at any one at the cost of the other two. The choice is yours to make and will change from task-to-task and project-to-project. I call this the ‘Software Quality Triangle’ because of it’s similarity to the age-old project manager’s triangle (Cost, Time, or Features). By graphically representing each concept as a point of an equilateral triangle and positioning the axis of rotation at it center you can gain perspective of what you will sacrifice over what you will gain. For instance, by rotating the triangle so that Performance is pointing straight up (it’s highest level) the other two points (Flexibility and Reliability) will drop.

97 Things Every Software Architect Should Know - The Book

April 6th, 2009

http://97-things.near-time.net/wiki/97-things-every-software-architect-should-know-the-book

This wiki was developed and published in a book available from O’Reilly Press. So far it’s a great read put together by the collective experiences of many seasoned developers. My own favorite so far is: Simplicity before generality, use before reuse.

I think the idea of capturing this all by wiki is entirely awesome. I just wish they would keep it going, I’m certain there is more than 97 things I should know ;)

BTW, Thanks to From 9 till 2 for bringing this to my attention.

Simple is always better

March 18th, 2009

This is the one, the only, the quintessential rule to programming.  AKA, the KISS method (Keep It Simple Stupid) is, to me, words to live by.  From the implementation of a simple class to the design of a complex application you constantly have to keep this in the forefront of your mind.

Far too often I see the most talented of developers letting themselves get carried away with something they are working on. This inevitably results in the worst kind of code, over-engineered, over-complicated, and under-tested. I presume this is caused by their own miss-guided love of a project or component they are working on. Open source software often exhibits this, I would guess this is due in part by these projects being developed by those who love the project.

I have often been guilty of this pleasurable curse myself. The only effective way of curtailing this innate behavior has been two things. First, constant reminder of KISS, and second, self discipline in extensive unit testing. Honestly, the later serves better than the former… having to test everything you write often kills unnecessary and frivolous features. I would guess the best example I have of KISS is the silly little ‘Check.cs‘ file I created. Since I have not needed more than it’s 3 or 4 methods I have not written more. I have not added to this class not because I can’t imagine a whole host of other routines than could be added and perhaps useful *someday*. Rather I have not added anything I did not have an immediate need for very intentionally.
 
For those of you still confused: Simple Code == Less Code == Less Bugs

API Design Values

March 18th, 2009

Found an interesting post over on ISerializable - Roy Osherove’s Blog that outlines a few of the things that should be considered when designing software component interfaces.  To re-summarize and add my own two cents worth here they are in my own order of importance.

  1. Resilience – (the best single-word description I can come up with) Your API must, by necessity, provide no details about its implementation. Without consideration of this objective it can prove difficult to re-tool when the time comes.
  2. Independence – You should generally avoid binding your API/interfaces to external components. Most especially avoid anything that determines the environment the code must run within (using System.Window.Forms, System.Web, System.Console etc). Of note this also applies while coding in addition to just interface design, avoid use of global assessors (i.e. System.Web.HttpContext.Current) provided by the aforementioned namespaces.
  3. Explicitness – we decided early on to be as explicit as possible about the API, so that the least guessing needs to take place by the user
    Consistency – is the new API consistent with the other APIs that already exist, or does it go against the regular way things are done?
  4. Discoverability – If a user knew they wanted to do thing X, would they know to use the API without help? simply from intellisense?
    Single point of entry – everything should start from a single point (Isolate.Something())
  5. Readability – for the reader, not the writer (if you didn’t write the test would you find it easy to understand what it does?)

 

I’ve striken the following from my own consideration for the reasons stated:

Single way to achieve things – is there more than one way to do a task with the new API?
Not always a good thing, use your judgement and “keep it simple” will prevent this from being a problem.

Backwards compatibility – do we break an existing feature and cause some heartache for users?
See “Resilience” above, you already failed once be more careful next time.

Remember all of this basically helps to define the only phrase that matters:

“Keep it Simple”