Almost the first thing you need to create a BPlusTree is going to be the key/value serializers which are implementations of the ISerializer<T> interface. They are also going to be one of the things that can make or break your first impressions of using the BPlusTree. A descent performing implementation is going to be important in order to get the most out of your BPlusTree. Aside from performance alone, there is one other area of concern, that of the semantic interface.
Implementing the ISerializer<T> Interface
First let’s take a quick look at the interface definition for those that are not familiar with it.
public interface ISerializer<T> { void WriteTo(T value, Stream stream); T ReadFrom(Stream stream); }
Simple enough, right? Well, not exactly. It seems that one thing is not obvious from this interface. The ‘stream’ parameter is not constrained in any way. One could seek to an offset they should not be at, or read past the end of the item they are suppose to be reading. Exceptions are also a bad idea here since it can prevent you from reading the tree. So the following guidelines should be used when implementing this interface for use with the BPlusTree:
- Always use an immutable class, a value type, or ensure that references to the class inserted/updated/fetched are not modified.
- When writing values you should try to avoid any case for throwing an exception.
- Always use a length-prefix or termination value to prevent reading more data than was written for this value.
- When reading you should only throw exceptions when you believe the data has been corrupted.
- Never read more data than was written for this record as this will cause subsequent values to be corrupted.
- Do not seek or use positional information from the stream, assume it is either append-only writing, or forward-only reading.
Primitive Serialization
For most primitive types you should not need to write one, instead use the The PrimitiveSerializer class (declared in CSharpTest.Net.Serialization). The PrimitiveSerializer class has static properties for many of the serializers you will need for primitives. These are all written using little-endian byte ordering for numeric values.
If you need to write a compact format there is another class to help you with numeric encoding. The VariantNumberSerializer class exposes serialization for protocol buffer-like variant encoding of integer values. Basically this can be used to store positive integer values in more compact format than using the PrimitiveSerializer.
Neither of these provide an array serializer for anything other than a byte[]. This can often be a desired type so here is a generic implementation that will serialize arrays given a serializer for the item type.
public class ArraySerializer<T> : ISerializer<T[]> { private readonly ISerializer<T> _itemSerializer; public ArraySerializer(ISerializer<T> itemSerializer) { _itemSerializer = itemSerializer; } public T[] ReadFrom(Stream stream) { int size = PrimitiveSerializer.Int32.ReadFrom(stream); if (size < 0) return null; T[] value = new T[size]; for (int i = 0; i < size; i++) value[i] = _itemSerializer.ReadFrom(stream); return value; } public void WriteTo(T[] value, Stream stream) { if (value == null) { PrimitiveSerializer.Int32.WriteTo(-1, stream); return; } PrimitiveSerializer.Int32.WriteTo(value.Length, stream); foreach (var i in value) _itemSerializer.WriteTo(i, stream); } }
Using Protocol Buffers with the BPlusTree
Protocol buffers make an excellent way to serialize data for the BPlusTree. They are compact, fast, and can be extended/modified over time without having to rewrite the records. Due to this I highly recommend using protobuf-csharp-port to generate the classes you will use. In addition to the serialization benefits mentioned this library is also built upon immutable types with builders. This is very important for the accuracy of the data since the BPlusTree makes no guarantees about when the instance will be serialized and it may use the same instance multiple times when reading. The following serializer will provide you a generic implementation that can be used with the protobuf-csharp-port library (NuGet Google.ProtocolBuffers).
using CSharpTest.Net.Serialization; using Google.ProtocolBuffers; internal class ProtoSerializer<T, TBuilder> : ISerializer<T> where T : IMessageLite<T, TBuilder> where TBuilder : IBuilderLite<T, TBuilder>, new() { private delegate TBuilder FactoryMethod(); private readonly FactoryMethod _factory; public ProtoSerializer() { // Pre-create a delegate for instance creation, new TBuilder() is expensive... _factory = new TBuilder().DefaultInstanceForType.CreateBuilderForType; } public T ReadFrom(Stream stream) { // Use Build or BuildPartial, the later does not throw if required fields are missing. return _factory().MergeDelimitedFrom(stream).BuildPartial(); } public void WriteTo(T value, Stream stream) { value.WriteDelimitedTo(stream); } }
What about using protobuf-net? This is also very easy to implement and would be the preferred way of code-first serialization (no proto description, just a .NET class/struct). As we did with the protobuf-csharp-port you must length-prefix the messages. Avoid using the IFormatter implementation in protobuf-net since this has a semantic incompatibility with BinaryFormatter and other implementations (it is not length prefixed and reads to end-of-stream). Here is a generic class for serialization using the protobuf-net’s static Serializer class:
public class ProtoNetSerializer<T> : ISerializer<T> { public T ReadFrom(Stream stream) { return ProtoBuf.Serializer.DeserializeWithLengthPrefix<T>(stream, ProtoBuf.PrefixStyle.Base128); } public void WriteTo(T value, Stream stream) { ProtoBuf.Serializer.SerializeWithLengthPrefix<T>(stream, value, ProtoBuf.PrefixStyle.Base128); } }