2011 October

How to tame Google +1 button performance problems

Coding, Home

Oct 282011

Let’s face it the clever guys at Google have more bandwidth than the rest of the nation combined. What better to do with all that excess pipe ;) There are of course numerous posts about the performances of +1 buttons. If you missed them there is the ever popular Google +1 Button Performance Review, and Why You Should Not Implement the Google +1 Button At This Time. Both of these have since been updated to indicate that the issues have been addressed. While I will admit that they have solved making the buttons asynchronous (If you implement it correctly) they are still far from done IMHO.

As you may or may not be aware I’ve recently added plus-one buttons to all the post titles on my site in the hopes that you will click one for me. Go click one. I mean it. Never-mind. Anyway the +1 button is cool but it currently performs like a dog. For every button you display an iframe is created that internally uses 5 different resources. Then of course there are two scripts on containing page. The worst part for me is waiting to see these ‘magic’ buttons just pop up out of the blue a second or so after the page loads. All-in-all it is not an impressive experience, so I decided to do something about it.

Completely Lazy Loaded +1 Buttons

It turns out it is really easy to get the experience I wanted. I wanted the +1 button images to show up immediately and then if someone desires to interact with them we’ll create the ‘real’ +1 button. The following solution was created to do just that, inline images and no scripts until required.

The first thing we need is a new HTML placeholder to use instead of the “g:plusone” tag. I chose to use an anchor tag, but a div works too:

<a class="plusone"
    title="+1"
    onclick="return false;"
    href="http://csharptest.net/index.html"
    onmouseover="renderPlusOne({parent:this,href:this.href});" >
</a>

If you are using a div tag you can remove the onclick handler, but this gets us… what? a link? … no we some css to turn this anchor into a proper placeholder:

.plusone {
    display: inline-block;
    width: 24px; height: 15px;
    padding: 0px; margin: 0px; overflow: hidden;
    border: none; background-image: url("http://csharptest.net/images/plusone.png");
}

The image I’m using ‘plusone.png’ was extracted from +1′s sprite.png for the size I wanted, small. So at this point we have a nice place-holder that only needs the single image resource to render. Now what do we do with the scripts? Well we need to implement the ‘renderPlusOne’ method from the ‘onmouseover’ event above. Google was nice enough to provide a gapi.plusone.render method that allows us to explicitly render the +1 button. The only problem is before we call that method we need the plusone.js script on our page. So we have two choices, load it even though we may not use it, or defer loading it until the user wants it. Obviously I’m going after the later approach. If you don’t mind the script, just load it and you’ll only need to call render directly in the onmouseover event. Assuming you want to lazy load all scripts you just need to include the following javascript fragment:

var loadgapi=(function(item){
  loadgapi=null;
  var po=document.createElement('script');po.type='text/javascript';po.async=true;
  po.src = 'https://apis.google.com/js/plusone.js';
  po.onload = (function() { renderPlusOne(item); });
  po.onreadystatechange = (function() { renderPlusOne(item); });
  var s=document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
});
function renderPlusOne(item) {
  if(loadgapi != null) { loadgapi(item); }
  else {
    try {
      gapi.plusone.render(item.parent, { href: item.href, size: "small", annotation: "none" });
      item.parent.onmouseover = null;
    } catch (e) { }
  }
}

I’ve run this in the only browser that matters, Chrome, as well as a few of the others like Safari, Firefox, and IE (8 and 9). Try it for yourself, all the plus one buttons on this page are using this code. Feel free to drop a comment by following the title link.
Sample:

NuGet packaging available

BPlusTree, Building, Home, Library, Projects, Protocol Buffers, RpcLibrary

Oct 182011

A few weeks ago I published NuGet packages for version 1.11.924.348.

I also managed to get protobuf-csharp-port on board. I absolutely live by protobuffers.

Finally I recently published a complete protobuf services rpc layer build on the RpcLibrary and protobuf-csharp-port. Of course much of the code there could be used with any raw transport like WCF, TCP, Named Pipes, etc so check it out. This library is also available on NuGet:

Google.ProtocolBuffers.Rpc

If you haven’t tried NuGet you should look at it. If you manage an open source library you really need to get published there. It has allowed me to publish projects that have dependencies without the need to check those dependencies into source control. This reduces the burden on source control, allows for smaller archives, and makes it easier to upgrade dependencies. So trash the binaries from your source control and get started with NuGet.

Review: RaptorDB releases version 1.7

Testing

Oct 162011

Recently I published a set of B+Tree benchmarks comparing a few of the available libraries against my own C# BPlusTree implementation. The author of RaptorDB wrote to me and requested that I review the latest version, 1.7 of RaptorDB.

The points you made regarding RaptorDB were all valid in the previous version (pre the current v1.7), I would much appreciate if you could test again with the new version and tell me the results in comparison.

The Good:
I will say this version does perform better on seeks and is now reading at around 180k records per second. RaptorDB now runs to completion both in single-threaded and multi-threaded benchmarks which is a nice improvement.

The Bad:
The problem with benchmarks is that they must assume the library is doing what it is told to do to eliminate skewing the results with validation. After running the latest version (1.7) in a key-value test harness I found the multi-threading support is still very broken and the library is still corrupting the store on a single-threaded test.

The Ugly:

The new Shutdown() method does not shut the storage down and now causes exceptions when the domain unloads.
Missing use of thread synchronization primitives. Rather than using locks and signals this library uses mutable properties and Thread.Sleep().
This library is so far from being thread-safe it’s really kinda scary that they claim it would be. There is no locking or synchronization of any kind around object state manipulation.
Critical state exclusively stored in volatile memory. There are several things that are managed and stored in-memory only. This means that your going to have issues if you process crashes even when not actively using the store.
Usages of a polling thread for indexing the data is really unsettling, especially as there is no way to control it’s behavior or scope. As I said, I believe this is the most significant design flaw.
AFAIK from reading the code there is no way to enumerate only the ‘current’ records, rather you are forced to see all previous versions.
Inconsistent use of int/long when dealing with the number of records. Basically you’re limited to int.MaxValue number of records including all previous editions of records.
Manipulation of lists that are also enumerated without any syncronization.
I’m always concerned by finding “GC.Collect(2);” and even “Thread.Sleep(100); // breather” in any library. Forcing collection is not a fix for abuse of the LOH, nor does your computer need an arbitrary break from your code.
Console logging left active and no way to disable it. This is another nasty thing to do in a library, this should be moved to an event, or even a TextWriter property rather than directly accessing Console methods.

You might be supprised to hear I am very hopeful for this library. By the above commentary you might think I’m just trying to bash it for no apparent reason; however, you would be wrong. I’m being harsh, I agree, but I’m trying to communicate to this author, and to the community, what things could be improved upon when publishing an open source library.

Suggestions: I think RaptorDB would best be served by first focusing on single-threaded access and making that use-case work well. I would get rid of the ‘Indexing’ thread and index on insert even if only in memory. Remove every occurrence of Thread.Sleep(), Console.Write(), lock(), and the ‘thread control’ booleans _PauseIndex, _isInternalOPRunning, etc. Remove the uses of List, use a custom linked-list if necessary. Make sure that the log state is persisted. Fix the enumeration to correctly enumerate the key/value pairs. Remove usages of int and replace with long where appropriate, especially in record numbers and b-tree node references and values. Move the code-base to a source-control server, code.google.com, github.com, codepex.com anywhere the community can contribute. Then test, test, and test some more…

Mostly I want to impress upon the need for testing. I’ve spent 100′s of hours testing the BPlusTree. The NUnit test suite has over 100 tests and touches every single method public or private. I’ve manually reviewed the coverage and made certain that all major if/else branches are being exercised. I’ve written countless test harnesses allowing it to run non-stop for days on end. I sincerely hope RaptorDB will someday receive the same level of attention to testing and verification as I believe it shows great potential.

All things considered this library still fits in the ‘interesting but unusable’ category.

BPlusTree performance at large volumes (200 meeLLLLion)

BPlusTree, Design, Home

Oct 132011

Ok I swear this is the last B+Tree chart, graph, or metric I’ll post… at least for a while. I wanted to hit new limits so I fired up 6 threads on my six-core and let them rip… After three hours I finally stopped the run to look at the results.

Each of the six threads had inserted over 34 million records, a total of over 200 million records inserted in 3 hours averaging over 18k inserts per second. The data file had reached 5.79 GB (6,221,824,032 bytes). I guess that isn’t too bad considering I was using my slow 2tb WD caviar green (I didn’t want to thrash my SSD). Opening the file doesn’t do much other than to read the first block, from a cold start it took 62ms. The first thousand random seeks took an average of 10ms and thereafter settled in at around 9ms. Sequential scans of 1000 keys at a random offset takes about 500ms averaging .5ms each.

The following is a graph of one thread (all 6 near identical) showing the time in seconds (vertical) and total records inserted in millions (horizontal). Obviously the point being made here is that the line is straight :)

Time / Records
B+Tree Insert Progression

If you are wondering why these numbers don’t line up with the B+Tree benchmarks previously posted there are two reasons. First I totally screwed up the multi-threaded test for those numbers. The second difference was the amount of data/record size. The above graph shows inserts of an Int32 key and a Guid value or 20 byte records. The benchmark used a Guid key on a 100 byte value or 116 byte records. The major reason for reducing the record size was to improve throughput so I could target a larger number of records.

BPlusTree benchmarks and comparisons

BPlusTree, Home

Oct 122011

Test Overview
The benchmark results below are from a single thread on a process with the thread-affinity set except two. The BPlusTree(Threaded) and obviously MySql were not constrained. All data used during the test was generated before the test ran. The number displayed in the logarithmic horizontal axis is the average number of operations completed per second. This is the worst average of the last 5 of 8 total runs.

Benchmark Results

BPlusTree
The strong showing here is most certainly the select time. BPlusTree and STSdb outpace all the rest by a full order of magnitude. As you can see though the insert, update, and delete times, though respectable, are not near some of the other options.

BPlusTree (Threaded)
This test used 4 CPUs and 4 threads to produce the throughput shown. The general goal here was to demonstrate the throughput was higher than a single-threaded test. Showing a 50% gain on insert and update times across 4 threads gives you an idea of the level of resource contention.

So why are no other tests repeated as a multi-threaded version? Well quite simply I tried and all of the other libraries either did not support threading, or failed to prove thread-safe during testing. Raptor DB corrputed state and hung the application. Even the beloved Berkeley DB blew up on threaded deletes both in transactional and non-transactional modes. While MySql worked the benefits in performance were not worth noting here.

RaptorDB
Although this library did produce some good numbers on modifications I really have to call this one a non-starter. Not only can you not close/unload it, but it basically does nothing more than append the data to a log and perform a manual lookup (at least until their indexing thread catches up). This, IMHO, is just a nasty design flaw and shows up well in their blazing slow seek performance. I managed to crash and/or corrupt the store several times, even on a single thread. As for the claimed multi-threading capabilities of this library it’s a joke, this library is far from thread safe. Frankly you would be better off appending the data to a single file with a Dictionary<K,V> to give you the file offsets. That is basically what it seems to be doing behind the scene, even so far as to make you call SaveIndex() to persist the index. (Update: Review of RaptorDB version 1.7)

BPlusDotNet
Nothing spectacular here unless you need Java/C++ compatibility which is nice. The select times on this seem to be sub-par and there is no support AFAIK for multi-threading. All-in-all it behaved well in every single-threaded test. Unlike RaptorDB it seems stable enough to use in the right environment. It even performs much better than my own for deletes, I’ll have to take a look at that ;)

STSdb
This library really blew me away. Although I could not find any documentation about multi-threading it is so wicked fast I’m not sure I care. The only down side is that this library is GPL licensed unless your willing to pay for commercial use. I don’t know what the cost of it is but if you need that kind of performance it may be worth investigating. Definitely a two-thumbs-up library if you have plenty of ram and hard-drive space to work with ;)

BerkeleyDB
Again I was surprised to see this outpace a fully managed solution by a margin of 3:1. Outside of it’s relatively slow select speed it was a consistent high-performer. I will warn you that I was able to corrupt it’s state a few times, mostly in multi-threaded deletes which is why I did not post a threaded performance chart. If you’ve never worked with it, the library is cumbersome to work with due to the infinite number of options. I’ll also warn you to stay away from it’s transacted mode, with transactions enabled this performs very poorly especially for inserts (1350/sec).

MySQL
Just threw this in to give a comparison to a well-known RDBMS solution.

Resource Usage
Inserts per second

The chart above indicates kilobytes per 1000 records of 116 bytes. All considered everyone was close on resource usages except the clear winner and loser. I was using the GC for obtaining memory usage so Berkely does not have memory usage information; however, it’s disk cost was very good. The looser here is not really a looser based on it’s performance, but clearly we know where STSdb is getting all the performance from :)

Summary
Out of all the purely managed, free, and open source (non-GPL) I still think the BPlusTree stands out as a winner based on it’s low memory usage, fast seek times, and multi-threading support that actually works. If I were willing to buy it though I’d have to go with STSdb and just buy some more memory and hard-drive space :)

BPlusTree and ACID?

BPlusTree

Oct 122011

Before I go much farther talking about the BPlusTree I want to address one thing. Does the BPlusTree support ACID operations? (definitions below from wikipedia) Atomicity Atomicity requires that database modifications must follow an “all or nothing” rule. Each transaction is said to be atomic. If one part of the transaction fails, the entire transaction [...]

csharptest.net