Library

So, just finally made time to port the last project out of my old Google code repo. So here are all the links for the new packages and repositories:

BPlusTree (and supporting parts of Library) -> CSharpTest.Net.Collections
https://github.com/csharptest/CSharpTest.Net.Collections
https://www.nuget.org/packages/CSharpTest.Net.Collections

Library\Commands -> CSharpTest.Net.Commands
https://github.com/csharptest/CSharpTest.Net.Commands
https://www.nuget.org/packages/CSharpTest.Net.Commands

RpcLibrary -> CSharpTest.Net.RpcLibrary
https://github.com/csharptest/CSharpTest.Net.RpcLibrary
https://www.nuget.org/packages/CSharpTest.Net.RpcLibrary

Library\Tools\* -> CSharpTest.Net.Tools
https://github.com/csharptest/CSharpTest.Net.Tools
https://www.nuget.org/packages/CSharpTest.Net.Tools

The former package for CSharpTest.Net.BPlusTree has been deprecated. This package depended upon the CSharpTest.Net.Library and I wanted to make the package stand-alone. To avoid causing unneeded headaches I’ve chosen to rename the package to eliminate the confusion with the previous version(s). The new Collections package contains all the BPlusTree code, plus the code within the Library that it depended on. Hopefully this stand-alone version will make it much more usable.

For all the packages, my goal was to not change code, add features, or fix bugs during the move. This is 99% the case, a few changes were made to reduce dependencies in some cases, but the overall code remains basically the same.

So to start with what does “LurchTable” stand for?

Least
Used
Recently
Concurrent
Hash
Table

Of course I am dyslexic so I can get away with swapping U and R around. My dyslexia aside, It’s a fair representation of what the class does. It essentially is a ConcurrentDictionary that keeps and maintains a linked-list of nodes in specific way. It does differ significantly from ConcurrentDictionary in a few very important ways…

The number of hash buckets available can not change after construction. This keeps the code simple and maintainable while providing all the tread-safety goodness. This is also going to be the biggest change from the ConcurrentDictionary. The LurchTable is typically used with a hard-limit on items. For example a cache might allow no more than 10k items. So we probably want at least 5k hash buckets, each ‘bucket’ is just an integer so this would pre-allocate 20kb of memory as an int[]. On the extreme end, if you were to supply int.Max as the hash size you would exceed .NET’s ability to allocate the memory required.
Key/Value pairs (entries) are allocated and expanded dynamically as-needed. Unused entries are tracked in a separate linked-list. Unlike the .NET Framework’s Dictionary that will fail when adding the ‘nth’ entry due to an OutOfMemoryException being raise, the LurchTable does not allocate all entries as a single array. The allocation size is specified (or derived) at construction and the LurchTable will allocate a new array of this size and ‘append’ it to the existing allocated entries. This preserves time and memory as the contents of the dictionary fluctuate and prevent the LurchTable from attempting excessively large allocations.
The linked lists, both used and unused entries, are maintained via lockless linked lists. The use of non-blocking linked lists allows for a high-degree of concurrent modifications to be made to the contents of the collection. This approach also greatly reduces the contention on the ‘free’ entries required to find an empty entry to use for a new item. In addition, the LurchTable maintains several ‘free’ lists to even further reduce contention.

Linking Entries

Entries can be either non-linked, linked by insertion, lined by insertion or modification, or linked by insertion, modification or access. The internal linking strategy is determined at construction by specifying one of the LurchTableOrder enumeration members (Access, Insertion, Modified, None). When ordering by anything other than None, a hard-limit on items can be specified at construction as well. When a hard-limit is set, the insertion of n+1 items will cause the ‘oldest’ item in the ordered list to be removed. Typically all this is specified at construction by using the more verbose ctor on the LurchTable:

    public LurchTable<TKey, TValue>(
        LurchTableOrder ordering,         // One of Access, Insertion, Modified, or None
        int limit,                        // Hard-limit number of items, or int.MaxValue for no limit
        int hashSize,                     // The number of hash buckets to allocate (adjusted up to a prime number)
        int allocSize,                    // The number of items to allocate at once (power of 2 that is < = allocSize)
        int lockSize,                     // The number of locks to allocate (adjusted up to a prime number)
        IEqualityComparer<TKTKey> comparer  // The comparer to use for the keys
    )

Other Deltas from the ConcurrentDictionary

New methods not typically found on a dictionary include the ability to Peek, Dequeue, or TryDequeue the oldest item in the collection. Peek will return the oldest entry based upon the link order. TryDequeue will attept to remove the oldest entry based upon the link order and can optionally take a predicate to control removal. Lastly, Dequeue is a tight-polling loop on TryDequeue and returns only after an item is successfully dequeued (Note: For obvious reasons Dequeue should be used cautiously).

All enumerations of the LurchTable are thread-safe and in Hash-order. Due to the lockless nature of the internal linking, it is not possible to enumerate the contents of the collection in the linked order. Enumeration of the links could easily be added; however, it would not be thread-safe and thus is not currently implemented.

All methods on the collection are thread-safe except the “Initialize” method. The Initialize() method recreates the internal ‘entries’ collection and essentially removes all items in O(1) whereas the Clear() method is now an O(n) time operation.

Events for ItemAdded, ItemUpdated, and ItemRemoved are exposed from the collection and are executed in-lock with the operation. While I don’t like events that execute from synchronized code, there really isn’t a better way to deal with this. Since I heavily rely on ItemRemoved events to flush data from a LurchTable cache to primary store, the event must act as an atomic operation to the consumer of the collection. All events are fire after internal structures have been modified and just prior to lock-release. Care should be taken not to raise exceptions from these events as these will propagate to the caller attempting to add/update/remove the item.

Uses for LurchTable

LRU Cache An LRU (Least Recently Used) cache is the most common of applications for this structure. Simply configure a hard-limit on the items and use LurchTableOrder.Access at construction. Then from the caller use the GetOrAdd call to either pull from cache or fetch from primary and add to cache in a single call.
Write-through Cache Another common usage might be a write-behind/delayed-write strategy for an external storage mechanism. Constructing with a hard-limit and using LurchTableOrder.Modified will give you a read-through/write-through cache. Use the GetOrAdd to fetch and any of the available set operations to write, then hook the ItemRemoved to flush to external storage.
Throttling Work Queue An interesting use (or abuse?) of the LurchTable is using it to distribute work across several threads. When a producer of work can outpace the ability of the consumers one needs to throttle the producer. One way is to simply sleep the producer until more work can be queued, another is to let the producer also consume work. Construct with LurchTableOrder.Insert and a hard-limit on the back-log to allow. Hook the ItemRemoved event to implement the work item processing. Start one or more threads producing work items by adding them to the dictionary. Finally start one or more threads consuming data by calling Dequeue/TryDequeue. This works well if the work to be performed is small (since the consumer can block the producer), mileage may vary for longer running tasks.

I’ve used this collection extensively for the past year and have been very pleased with the results and ease of use. It’s probably one of the more versatile and better pieces of code I’ve put together in a while. Give it a try by just including the source directly (/browse/src/Library/Collections/LurchTable.cs) or by using the NuGet package CSharpTest.Net.Library. You can also find the online help at http://help.csharptest.net.

In my previous post “Why GUID primary keys are a database’s worst nightmare” I spoke about some of downsides to using GUID primary keys. In this post I’m going to focus more specifically on the .NET Framework’s implementation of System.Guid. I have a few complaints about this class, let’s cover them…

1. Guid.ToByteArray()

This, IMHO, is one of the worst possible things about a Guid. First this method returns a byte array in little-endian format. Not only is this directly opposite the stated standard GUID format, but it also means the bytes do not compare the same as the structured value. Thus when you combine System.Guid with a sequential guid generation you wind up with either a structure that doesn’t compare sequentially, or a byte array that doesn’t compare sequentially. Yes you can always swap bytes (1,4), (2,3), (5,6), and (7,8), but really? should I have to?

The second issue I have with this method is that it absolutely requires me to allocate the bytes on the heap. While this may not sound like such a big deal, if you are moving large volumes of data and serializing those Guids the only way you can (as byte[]) it will thrash the GC. I’m hopeful that in some future version of .NET we will find a Guid with a ToByteArray(GuidFormatType, byte[], offset) where GuidFormatType would allow little-endian or big-endian formats.

2. Guid.GetHashCode()

This one does puzzle me. They choose to completely ignore bytes 9, 10, 12, 13, 14, and 15 when computing a Guid’s hash code. Ok so maybe with random Guids this really doesn’t matter, but when you start trying to use Guid with sequential values this starts to impact hash performance. If, as in my example on the previous post, bytes 1-8 are date-time, then your entire hash value amounts to the entropy introduced in bytes 11 and 16. To be specific, byte 11 xors with the MSB and byte 16 xors with the LSB.

3. Guid.NewSequentialGuid()

Don’t you wish that .NET could just do this for us? We’ve know this to be a problem, and have know of a resolution to it, since 2002. They’ve even added “NEWSEQUENTIALGUID()” to Sql Server. Wouldn’t you think they could get this done for us? I guess until they fix the endianness of ToByteArray it’s really a moot point anyway.

Of course one option is to just PInvoke UuidCreateSequential.

static class Win32
{
    [DllImport("rpcrt4.dll", SetLastError=true)]
    public static extern int UuidCreateSequential(out Guid guid);
}

4. Guid.(Missing Properties)

Another annoyance is the lack of exposing the values of the Guid. Would it have really been that hard to expose the same integer, two shorts, and 8 bytes that we can pass to the constructor? One could then create an efficient binary serializer for the Guid type without a GC allocation and without worrying about byte-swapping little-endian values. In addition, the underlying randomization could then be reused without using the ToByteArray method. Maybe this isn’t a big deal for most people, but I’m still miffed that (AFAIK) there is not a single thread-safe means of creating a random value without allocating a byte[].

A few weeks ago I published NuGet packages for version 1.11.924.348.

I also managed to get protobuf-csharp-port on board. I absolutely live by protobuffers.

Finally I recently published a complete protobuf services rpc layer build on the RpcLibrary and protobuf-csharp-port. Of course much of the code there could be used with any raw transport like WCF, TCP, Named Pipes, etc so check it out. This library is also available on NuGet:

Google.ProtocolBuffers.Rpc

If you haven’t tried NuGet you should look at it. If you manage an open source library you really need to get published there. It has allowed me to publish projects that have dependencies without the need to check those dependencies into source control. This reduces the burden on source control, allows for smaller archives, and makes it easier to upgrade dependencies. So trash the binaries from your source control and get started with NuGet.

A very minor update to the library this past weekend.

Addition of Cyrpto.SecureTransfer to provide file transfers via shared public keys.
The Crypto.AESCryptoKey now has ToArray() and FromBytes() like other keys.
HashStream can now aggregate read/write calls to actual storage stream while computing the hash.
The Crypto.Hash class received a new method, Combine(…)
Html.XmlLightElement and related classes are now fully modifiable.
BPlusTree.Options now supports a ReadOnly property to ensure no writes at the file handle level.

Last year I ran across this 2009 post by Bruce Schneier entitled “Another New AES Attack“. It got me thinking about and dissecting the Rijndael algorithm which most of you know as AES (Advanced Encryption Standard). This research surprised me, I found that AES has only three variants. These variants are best known by their key [...]

The help site doesn’t suck as bad as did before. Make me feel better and take a look: http://help.csharptest.net/ I’ll know if you don’t. So I finally gave up on my last effort at getting help automated. To recap my struggle: basically Sandcastle, the original help system I used, was a complete flop. I spent [...]

Just published the latest release v1.10.1124.358 of the C# Library hosted on google code. Major new functionality in the form of the new assembly CSharpTest.Net.RpcLibrary. This library wraps PInvoke calls to the Win32 RPC APIs. This allows pure managed C# applications the ability to move byte arrays over the supported rpc protocols. The following is [...]

So first off this is not new, it’s been around for a long, long while. One issue I have with IDisposable is that for whatever reason .Net did not include a base class for handling the details… I admit you may not always be able to leverage it since you may have a base class, [...]

This one is actually something of a triviality, yet it can be very useful at times. Now a word of caution is due here, this object is built on System.Threading.Timer and should not be used to enqueue 12 million work items. Yet if you need to perform a simple task at some time in the [...]

csharptest.net

CSharpTest.Net.Collections v14.906.1403.1082

Introducing the LurchTable as a C# version of LinkedHashMap

Linking Entries

Other Deltas from the ConcurrentDictionary

Uses for LurchTable

Understanding the pitfalls of using System.Guid with sequential values

NuGet packaging available

Version v1.11.924.348 released

Using Rijndael to improve data security beyond AES

Back 2 business… New help is live, Thanks to Document! X

Version v1.10.1124.358 – Introducing the C# RpcLibrary

How to correctly implement System.IDisposable (disposable pattern)

How to delay execution of a c# Action delegate without a lot of work?

Search

Projects

Recent Updates