Testing

 

Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results.

– Hetzel, William C., The Complete Guide to Software Testing, 2nd ed.

So what about that statement does not sound like your job? Testing software, every time it changes, is software development. Without testing all that code you wrote is useless to someone that expects it work. Without automated testing that bug you fixed is more risk than reward.

 

Recently I published a set of B+Tree benchmarks comparing a few of the available libraries against my own C# BPlusTree implementation. The author of RaptorDB wrote to me and requested that I review the latest version, 1.7 of RaptorDB.

The points you made regarding RaptorDB were all valid in the previous version (pre the current v1.7), I would much appreciate if you could test again with the new version and tell me the results in comparison.

The Good:
I will say this version does perform better on seeks and is now reading at around 180k records per second. RaptorDB now runs to completion both in single-threaded and multi-threaded benchmarks which is a nice improvement.

The Bad:
The problem with benchmarks is that they must assume the library is doing what it is told to do to eliminate skewing the results with validation. After running the latest version (1.7) in a key-value test harness I found the multi-threading support is still very broken and the library is still corrupting the store on a single-threaded test.

The Ugly:

  1. The new Shutdown() method does not shut the storage down and now causes exceptions when the domain unloads.
  2. Missing use of thread synchronization primitives. Rather than using locks and signals this library uses mutable properties and Thread.Sleep().
  3. This library is so far from being thread-safe it’s really kinda scary that they claim it would be. There is no locking or synchronization of any kind around object state manipulation.
  4. Critical state exclusively stored in volatile memory. There are several things that are managed and stored in-memory only. This means that your going to have issues if you process crashes even when not actively using the store.
  5. Usages of a polling thread for indexing the data is really unsettling, especially as there is no way to control it’s behavior or scope. As I said, I believe this is the most significant design flaw.
  6. AFAIK from reading the code there is no way to enumerate only the ‘current’ records, rather you are forced to see all previous versions.
  7. Inconsistent use of int/long when dealing with the number of records. Basically you’re limited to int.MaxValue number of records including all previous editions of records.
  8. Manipulation of lists that are also enumerated without any syncronization.
  9. I’m always concerned by finding “GC.Collect(2);” and even “Thread.Sleep(100); // breather” in any library. Forcing collection is not a fix for abuse of the LOH, nor does your computer need an arbitrary break from your code.
  10. Console logging left active and no way to disable it. This is another nasty thing to do in a library, this should be moved to an event, or even a TextWriter property rather than directly accessing Console methods.

 
You might be supprised to hear I am very hopeful for this library. By the above commentary you might think I’m just trying to bash it for no apparent reason; however, you would be wrong. I’m being harsh, I agree, but I’m trying to communicate to this author, and to the community, what things could be improved upon when publishing an open source library.

Suggestions: I think RaptorDB would best be served by first focusing on single-threaded access and making that use-case work well. I would get rid of the ‘Indexing’ thread and index on insert even if only in memory. Remove every occurrence of Thread.Sleep(), Console.Write(), lock(), and the ‘thread control’ booleans _PauseIndex, _isInternalOPRunning, etc. Remove the uses of List, use a custom linked-list if necessary. Make sure that the log state is persisted. Fix the enumeration to correctly enumerate the key/value pairs. Remove usages of int and replace with long where appropriate, especially in record numbers and b-tree node references and values. Move the code-base to a source-control server, code.google.com, github.com, codepex.com anywhere the community can contribute. Then test, test, and test some more…

Mostly I want to impress upon the need for testing. I’ve spent 100′s of hours testing the BPlusTree. The NUnit test suite has over 100 tests and touches every single method public or private. I’ve manually reviewed the coverage and made certain that all major if/else branches are being exercised. I’ve written countless test harnesses allowing it to run non-stop for days on end. I sincerely hope RaptorDB will someday receive the same level of attention to testing and verification as I believe it shows great potential.

All things considered this library still fits in the ‘interesting but unusable’ category.

 

No idea how I found myself on this blog; however, it was an interesting read: Why Is 100% Test Coverage Easier To Achieve?

Although the translation is a little difficult to read he does an excellent job of pointing out the obvious benefits to 100% functional coverage. Let’s recap his main points…

1. First he mentions the obvious transition from “do we really need to test it” to the eventual “did we really need to write it”. By requiring 100% functional coverage on a code base you eliminate waste. Code that is not accessible via public interfaces is usually one of the first things that disappears. This pruning of dead-code can be very beneficial as the project ages.

2. The second benefit he discusses is that it reduces the likelihood for coverage to be reduced over time due to release pressure. For instance if it’s acceptable to have 20% of your code untouched by automated testing, then why is 21% so bad? or 23%? or 25%? or 35%? Release after release you lower your bar to meet expectations.

While these are certainly some of the key benefits, there are other benefits I enjoy by continuing to release the library at 100% functional coverage. So here are the things I would add to his original two points:

3. Over time it makes developers more conscious about over engineering a solution. Since they know that they will be required to test every method, they tend to keep their designs simple yet sufficient.

4. Rewrite anything, delete anything. One key problem that exists in partially covered code is that refactoring a low-level component can be dicey. Maybe all the pieces that depend on you have adequate coverage, and maybe they don’t. Making every method important to test makes it less possible that a re-factored piece of code doesn’t introduce bugs.

5. Totally saves time. If you have read this blog, you know I’ve writing a B+Tree. Among many things it relies on are storage layers and locking factories. By building and testing these in isolation first before integrating them with the intended use-case, I save tons and tons of time. A subtle bug in a lock would play havoc on multi-threaded code, how much time would it take you to figure that out the root cause of a concurrency issue was a race condition in a lock? This is why testing is important to me, I hate wasting my time.

6. Stuff just works… It seems like every time I take a reasonably complicated class from cursory usage coverage to 100% functional coverage I find something broken. When given an average coverage of 60-80% functional, this translates to only around half or 50% statement coverage. I don’t know about you but I can fit a whole lot of bugs in just half the code I write. Now the inverse, when you forcibly hit 100% functional your wind up with somewhere between 90~95% statement coverage.

For me and this library, most of this remaining uncovered code looks something like “throw new XxxException()” or “return -1″ etc. In fact, out of the 14,328 statements in the library only 710 are not touched by a unit test. Of those 710 lines there are only 263 after you throw out the set that contains ‘throw’, ‘catch’, ‘return’, ‘Trace’ and ‘Log’. Even reviewing manually the remaining 263 demonstrates that the majority of that is also error handling. So what’s the point? The point is that pushing to and keeping 100% functional coverage helps ensure that positive test cases are in place.

Disclaimer:
Does 100% functional coverage mean anything? No. Clearly, and we all know this to be true, making a function execute is not the same thing as testing it. Simply having a 100% functional coverage is meaningless without the discipline and maturity of all developers involved. If they are not 100% excited about 100% functional coverage then that goal will fail… one way or another. Even with disciplined developers bugs will still exist, that unfortunately doesn’t change.

However, if your whole team is excited about and serious about achieving 100% functional coverage it should become a check-in requirement. Just like you use Cruise Control to make sure every check-in builds and deploys, make certain your build fails if not at 100%. Remember it’s also important to find/write the tools up front to make locating the missing coverage quick and easy.

 

How to throw the InnerException of a TargetInvocationException without loosing stack details?

I’ve been plagued by this problem for some time. There are a few common solutions to this:

  1. Option #1 – just leave it alone. The downfall here is that specific types of exceptions cannot be caught easily in calling code. In some instances this can be a very big problem.
  2. Option #2 – re-throwing the InnerExcpetion property. This at least preserves the type of the exception and thus code above you in the call stack will correctly catch and handle exceptions. The problem here is that the stack information previously held in that exception is lost.
  3. Option #3 – Avoiding the problem. If you know the types of the calling parameters you can construct a delegate from the MethodInfo. By calling the delegate (not using DynamicInvoke) the issue is avoided. Again this only works if you have compile-time knowledge of the parameters.

 
Most of the time one of the above has been an acceptable solution; However, recently I ran into the case where none of the above would work. The code has been around a while using option #2 above since the arguments are unknown. Changing the behavior to not throw the original exception type was out since it could break existing client code. The problem that was killing me was the loss of the stack when debugging and monitoring log information. The loss of this information was making spend hours trying to figure out where the thing failed.

So I needed to use MethodInfo.Invoke(), needed the stack and the original excpetion type to be persevered… but how?

Well the first thing I came up with is the following routine which gets down-and-ugly with the Exception class. The catch-block finds the inner-most TargetInvocationException and extracts the InnerException. Then using the Serialization helpers it basically copies the fields of the object to an array. Now we re-throw the exception to set it as the ‘current’ exception when we next call throw without an argument. And finally after we’ve lost our stack we stuff it back in by calling the Serialization helper again to push all the fields back into the exception before calling throw one last time.

Bad Code, do not use

[System.Diagnostics.DebuggerNonUserCode]
[System.Diagnostics.DebuggerStepThrough]
private static void Invoke(MethodInfo method, Object target, params Object[] invokeArgs)
{
	try
	{
		method.Invoke(target, invokeArgs);
	}
	catch (TargetInvocationException te)
	{
		if (te.InnerException == null)
			throw;
		Exception innerException = te.InnerException;

		MemberInfo[] fields = FormatterServices.GetSerializableMembers(innerException.GetType());
		object[] values = FormatterServices.GetObjectData(innerException, fields);

		try { throw innerException; }
		catch(Exception exception)
		{
			FormatterServices.PopulateObjectMembers(exception, fields, values);
			throw;
		}
	}
}

This all worked well… However, I started to wonder about how this might effect some types of exceptions. I started thinking maybe I should serialize & deserialize the object first. I started thinking I was making this too complicated just to preserve a stack trace.

I finally just started reading the exception code and found they have exactly what I want already baked in… just not exposed. The method only preserves the stack, nothing else… perfect. So why not solve a reflection problem with reflection:

[System.Diagnostics.DebuggerNonUserCode]
[System.Diagnostics.DebuggerStepThrough]
private static void Invoke(MethodInfo method, Object target, params Object[] invokeArgs)
{
	try
	{
		method.Invoke(target, invokeArgs);
	}
	catch (TargetInvocationException te)
	{
		if (te.InnerException == null)
			throw;
		Exception innerException = te.InnerException;

		ThreadStart savestack = Delegate.CreateDelegate(typeof(ThreadStart), innerException, "InternalPreserveStackTrace", false, false) as ThreadStart;
		if(savestack != null) savestack();
		throw innerException;// -- now we can re-throw without trashing the stack
	}
}

The person that made this ‘internal’ should be flogged. How very easy of a solution that is and perfectly safe for all types of exceptions. It appears it will even work with remoting, serialization, cross app domains, etc.

My first request for .Net 5.0:

partial class Exception {
	public void PreserveStackTrace();
}

Updated: Apparently this isn’t a new hack, I should have done some google’ing ;)

 

After writing my last article I began to wonder what people ‘in-the-know’ thought the advantages of TDD vs integration testing were. So I quickly turned to my new favorite site stackoverflow. After reviewing several questions I came across this one entitled “Is Unit Testing worth the effort?“, and the accepted answer had 113 votes. So if you haven’t already, click the article title and read the arguments for using TDD, then read on for my responses.

Most of these apply to Integration Testing as well as TDD testing with a few that might be questionable. Let’s discuss those:

2. TDD helps you to realise when to stop coding. Your tests give you confidence that you’ve done enough for now and can stop tweaking and move on to the next thing.

Assuming your not ‘master-planning’ this should not be a problem. Write your code as-needed (YAGNI) is a principal that stands apart from TDD. This developer ‘tendency’ is also easily mitigated with requirements on test coverage %, the ‘over-engineered’ solution is less likely to be introduced if the developer is required to hit 100% functional coverage in integration testing.

4. TDD helps with coding constipation. When faced with a large and daunting piece of work ahead writing the tests will get you moving quickly.

I haven’t heard this argument before. *shrug* +1 for TDD, but I really don’t have this problem. I (and I’m most Sr Dev/Architects would agree) continually break down large tasks into small achievable goals every time I write a piece of functionality. Perhaps there is merit here for newbies, but I would guess that TDD as a whole is very beneficial for controlling the damage a newbie can cause. For the rest of us, simply writing TDD tests doesn’t mean you’ve adequately broken down the conceptual model of the coding problem.

5. Unit Tests help you really understand the design of the code you are working on. Instead of writing code to do something, you are starting by outlining all the conditions you are subjecting the code to and what outputs you’d expect from that.

I’m not sure how this applies to ‘design of the code’; however, the later part of the statement is valid. Capturing with tests the ‘behavioral contract’ of an object’s interface(s) is essential. This is and should be done by writing the client code first, even if it is just pseudo code. You should not be ‘throwing together’ interfaces without having reviewed it’s intended use. If TDD does this for you, great, I myself believe that the typical TDD (single AAA pattern) does not allow me to truly ‘feel out’ the client code. You can’t get a sense of the difficulty of using an interface is when only accessing a single method/member at a time. It breaks the flow of the intended client usage pattern into small granular chucks and that changes the things that you find cumbersome or difficult. TDD (IMO) is not a valid experience of writing the client code.

6. Unit Tests give you instant visual feedback, we all like the feeling of all those green lights when we’ve done. It’s very satisfying. It’s also much easier to pick up where you left off after an interruption because you can see where you got to – that next red light that needs fixing.

This I can totally see and agree with. Almost all developers enjoy a feeling of accomplishment from their work. Often while working on large projects it can be difficult to obtain. I think everyone needs small milestone gratification from the efforts, I enjoy the check-in to integration. As my build tells me I’m tested, working, and ready to be used, this is my moment of joy. I may not achieve this 20-30 times a day, but the two or three are very gratifying.

So in summary I again attest that TDD is cool and all, but it is not essential to a good piece of software. Testing; however, is required and the biggest thing I like about TDD is that finally got the rest of you writing unit tests :) I’ve been doing integration testing since 2000 and using NUnit+coverage since 2002 and I must say hardly anyone seemed to care about testing their own code back then.

 

I was viewing the comments on a recent post entitled Integration Tests Are a Scam when I ran across this:

Integration tests are needed
A Mars rover mission failed because of a lack of integration tests. The parachute subsystem was successfully tested. The subsystem that detaches the parachute after the landing was also successfully (but independently) tested.

On Mars when the actual parachute successfully opened the deceleration “jerked” the lander, then the detachment subsystem interpreted the jerking as a landing and successfully (but prematurely) detached the parachute. Oops.

&nbps;
What a great story. I concur with the author that “Integration tests are needed”, moreover, IMHO if your going to have only one or the other integration tests are far more important than TDD/isolation tests.

I have not yet delved into Mock frameworks or the like. I just don’t think they are necessary to perform a good job testing. My own library on code.google.com (IMO) speaks volumes about testing code without anything fancy. Ok, my tests are ugly I agree; however, they do catch most of the behavior of the public interfaces. I don’t use reflection to test code, I don’t expose members solely for the purpose of testing, and yet I continually average 95% statement-level coverage. My build asserts that 100% functional coverage was obtained and this helps keep dead code or useless code out of the source tree.

I’m not much on the whole TDD thing as anyone can tell by reading my tests. I do strive to design client-code first which I believe is very important. I also endeavor to not over-engineer things, adding what I need, as I need it, rather than building it all up front. I test everything I write. Why do I need TDD to do what is obvious and should be done as a natural part of development? Maybe I’m just too old to get it.

 

Ok, I just found the best site I’ve stumbled across in a long time. http://www.antiifcampaign.com/ Excellent job guys and mad respect for giving this issue it’s very own online site/campaign. Join me and the many others to move the community in a positive direction. BTW, I found this indirectly on a great post about the [...]

 

For those of you struggling to understand the different approaches to testing being heralded in by the TDD community I strongly encourage you read the following article: TDD Tests are not Unit Tests Stephen does an excellent job of detailing the general purpose behind the testing styles and really clarifies for my why I tend [...]

 

Recently I’ve been following Roy’s ISerializable blog’s series on test-case reviews: Test Review #1 – NerdDinner Test Review #2 – ASP.NET MVC Test Review #3 – Unity They have been fairly informative and full of insight into Roy’s view of proper testing. One thing he keeps commenting on continues to strike a nerve of mine. [...]

 

I seem to find myself talking with developers that seem to have some fixed notion of agile development. They speak like it’s an implementation of XP (eXtreme Programming), or TDD (Test Driven Development), or some other fairly concrete methodology. Maybe it’s just me, but I really think this misses the whole point? In my world, [...]

 

This is the one, the only, the quintessential rule to programming.  AKA, the KISS method (Keep It Simple Stupid) is, to me, words to live by.  From the implementation of a simple class to the design of a complex application you constantly have to keep this in the forefront of your mind. Far too often I [...]