Recently I published a set of B+Tree benchmarks comparing a few of the available libraries against my own C# BPlusTree implementation. The author of RaptorDB wrote to me and requested that I review the latest version, 1.7 of RaptorDB.

The points you made regarding RaptorDB were all valid in the previous version (pre the current v1.7), I would much appreciate if you could test again with the new version and tell me the results in comparison.

The Good:
I will say this version does perform better on seeks and is now reading at around 180k records per second. RaptorDB now runs to completion both in single-threaded and multi-threaded benchmarks which is a nice improvement.

The Bad:
The problem with benchmarks is that they must assume the library is doing what it is told to do to eliminate skewing the results with validation. After running the latest version (1.7) in a key-value test harness I found the multi-threading support is still very broken and the library is still corrupting the store on a single-threaded test.

The Ugly:

  1. The new Shutdown() method does not shut the storage down and now causes exceptions when the domain unloads.
  2. Missing use of thread synchronization primitives. Rather than using locks and signals this library uses mutable properties and Thread.Sleep().
  3. This library is so far from being thread-safe it’s really kinda scary that they claim it would be. There is no locking or synchronization of any kind around object state manipulation.
  4. Critical state exclusively stored in volatile memory. There are several things that are managed and stored in-memory only. This means that your going to have issues if you process crashes even when not actively using the store.
  5. Usages of a polling thread for indexing the data is really unsettling, especially as there is no way to control it’s behavior or scope. As I said, I believe this is the most significant design flaw.
  6. AFAIK from reading the code there is no way to enumerate only the ‘current’ records, rather you are forced to see all previous versions.
  7. Inconsistent use of int/long when dealing with the number of records. Basically you’re limited to int.MaxValue number of records including all previous editions of records.
  8. Manipulation of lists that are also enumerated without any syncronization.
  9. I’m always concerned by finding “GC.Collect(2);” and even “Thread.Sleep(100); // breather” in any library. Forcing collection is not a fix for abuse of the LOH, nor does your computer need an arbitrary break from your code.
  10. Console logging left active and no way to disable it. This is another nasty thing to do in a library, this should be moved to an event, or even a TextWriter property rather than directly accessing Console methods.

 
You might be supprised to hear I am very hopeful for this library. By the above commentary you might think I’m just trying to bash it for no apparent reason; however, you would be wrong. I’m being harsh, I agree, but I’m trying to communicate to this author, and to the community, what things could be improved upon when publishing an open source library.

Suggestions: I think RaptorDB would best be served by first focusing on single-threaded access and making that use-case work well. I would get rid of the ‘Indexing’ thread and index on insert even if only in memory. Remove every occurrence of Thread.Sleep(), Console.Write(), lock(), and the ‘thread control’ booleans _PauseIndex, _isInternalOPRunning, etc. Remove the uses of List, use a custom linked-list if necessary. Make sure that the log state is persisted. Fix the enumeration to correctly enumerate the key/value pairs. Remove usages of int and replace with long where appropriate, especially in record numbers and b-tree node references and values. Move the code-base to a source-control server, code.google.com, github.com, codepex.com anywhere the community can contribute. Then test, test, and test some more…

Mostly I want to impress upon the need for testing. I’ve spent 100′s of hours testing the BPlusTree. The NUnit test suite has over 100 tests and touches every single method public or private. I’ve manually reviewed the coverage and made certain that all major if/else branches are being exercised. I’ve written countless test harnesses allowing it to run non-stop for days on end. I sincerely hope RaptorDB will someday receive the same level of attention to testing and verification as I believe it shows great potential.

All things considered this library still fits in the ‘interesting but unusable’ category.

Comments