HttpClone

Recently I ran across an article on the subject of Content Management Systems and their inability to separate content editing from content publishing. The article titled “EditingPublishingSeparation” by Martin Fowler is worth a read.

I completely agree with his assertion that, from an architecture point of view, the editing and publishing of content should be separated. I would however take the assertion much farther than that. Websites should NOT be capable of editing themselves. The mere idea of this is absurd IMHO. I’ve written CMS systems before back in the late 90′s, and even then it was obvious. You can not secure a self-editing website.

Why is a self-editing website a bad idea?

1. The group take down. To say most CMS systems have a vulnerability or two is putting it mildly. Attackers love to take these vulnerabilities and then proceed to use automated software to seek out sites using that CMS and exploit them. This allows them to inexpensively disperse malware to a large audience in a very short period of time. This, IMHO, is the worst thing about running a CMS solution. Nobody specifically targeted your site, it just happened to be running on software they knew how to attack. No provocation needed, you got taken down with 10,000 other unfortunate people.

2. It runs in the browser. The issue here is that some form of logon allows users to modify the content on the web server. This means that the user’s horribly insecure browser environment is entirely in control of ‘production’ content. Thus a simple XSS script, a malicious browser plugin, or other common vulnerability can allow an attacker to modify content. Browsers are the worst place to be editing content. Even with the advent of Windows Live Writer and other rich-client authoring tools you still occasionally need to log into the website. So these tools help, but they do not fix the problem.

3. Preview is not a preview. Most all the of the CMS systems out there will allow you to preview the content before publishing it. Most of them get it wrong. It seems CMS systems are more and more moving to a “wysiwyg” display editing where they modify the output HTML so that you can edit it, even in preview. This then gives you no assurance about how it will actually format and display since the authoring widgets on screen change the HTML being rendered. Furthermore while previewing a single page is possible many CMS systems will not allow you to preview entirely new sections and navigation elements. Lastly previewing an entire redesign of the site’s look-and-feel, navigation structure, etc is also not possible.

4. My web server runs DRY. CMS systems often fail to appropriately cache the rendered HTML. This produces lags in performance as you server must reprocess the same content against the template over and over again. I prefer my sever to run as DRY as possible, Don’t Repeat Yourself. There is just no point in reprocessing the content for every request.

5. User provided content. IMHO, user authored content does not belong on your sever. This is one of the driving factors behind #4 and is simply not necessary. Using Facebook or another discussion server is easy. If you need something more fancy that what is freely available, go build it. Stand up a completely different site on a different domain with a completely different authentication model. Users should never log in to your site.

6. XCopy backup and deployment. Asside from backup and deployment there is also the issue with applying a version control system to most CMS systems. This is one of my biggest pet-peeves with CMS systems. They absolutely love to rely on a database back-end. Although some newer CMS solutions can use embeded sql servers, most do not support it and this is not an option if you are farming the content across several servers. I suspect most CMS sites are not being backed up regularly and if the server is lost or it’s drive corrupted their likely to loose most if not all of their site.

What are my alternatives?

1. Find a better CMS. I’m not aware of a single CMS system in operation today that avoids the issues above. Please correct me in the comments if this is inaccurate, I’d love to know if one exists.

2. Using a CDN (Content Distribution Network). These are often very powerful tools and can be configured to avoid many of the issues mentioned above. If you are looking for one I would consider CloudFlare a viable starting point.

3. HttpClone or similar product. I’m sure there are other solutions that have similar capabilities, but honestly I love using HttpClone. I use WordPress on the back-end and have a deployment script that automates the process end-to-end. Whether I’m publishing the result to a test server or to production it’s relatively easy once you get it working. The hard part was the configuration of the crawler to identify content I wanted removed or changed, and indexing for search. Once that was complete I wrote a simple batch file to do the deployment that looks roughly like:

@ECHO OFF
HttpClone.exe crawlsite http://csharptest.net/index.html
HttpClone.exe copysite http://csharptest.net/index.html http://csharptest.net/index.html /overwrite
HttpClone.exe optimize http://csharptest.net/index.html
HttpClone.exe index http://csharptest.net/index.html
HttpClone.exe addrelated http://csharptest.net/index.html
HttpClone.exe publish http://csharptest.net/index.html
mysqldump.exe -u root -ppassword --create-options --skip-extended-insert --databases csharptest --result-file=csharptest.sql

Basically what this does is crawls my locally running copy of this website (admin.csharptest.net) and captures the results. Then it crawls all the pages and changes references from admin.csharptest.net to csharptest.net overwriting the content that was previously there. Then it performs a series of steps: optimizing the content, creating the search index, and injecting related article links. Finally it packages and publishes all the content to the remote site, and then backs up the database. The entire site is instantly switched to the new content once it is ready. For small edits I can choose to publish the content directly to production, or more often I push to a local site to then verify the content package.

Obviously the most vulnerable part of the process is the code on the server that allows publication. This is why the entire thing requires the client and server to know each-other’s public key. They negotiate a session key, transfer the file, and sign/verify every request and response. This code uses the CSharpTest.Net.Crypto.SecureTransfer class from my library if you are interested in the details.

The benefit to both client and server using a public/private key is that an observer knowing only one of the two keys can learn very little about the content being transferred. It should be obvious that if an attacker obtains the servers private key they can replace the server (assuming some form of DNS poisoning or the like); however, they will not be able to then forward it to the actual server and still be able to read the content. Again it should be obvious that if someone were to obtain my client private key they can publish new or modified content to the server since this is the only form of authentication. I will add that even with my client private key they still can not upload anything that is executable on the server. This leaves my server secure and in-tact and all that is needed for me to recover is replacing the client key and republishing the content.

I wish the guys at WordPress or another CMS would just do this out of the box.

If you’ve missed it there is great article entitled Keep it secret, keep it safe by Eric Lippert. Essentially it attempts to dissect the essence of typical crypto issues in plain English (i.e. crypto for dummies). He did a great job of explaining the difficulties in key management, worth a read.

I found it particularly interesting that he brought up this topic since just days ago I released a “SecureTransfer” class. I’ll get more into the details of that later, but it is interesting here because it happens to be very susceptible to the very issue he warns about. To put the problem in simple terms:

The best implementations of cryptography out there are only as secure as their key storage.

This is most certainly true for the SecureTransfer client/server classes. That doesn’t mean that implementing a secure communication channel is easy. Far from it. It just means that *if* you’ve implemented a secure channel it’s most obvious attack vector is to crack the key store.

Key storage for HttpClone

This very problem was one of the first things I had to address with HttpClone (which is now serving this website). To publish content from my local machine over HTTP I needed to know the server’s public key, and the server needs to know my local public key. In addition both client and server also have to store their own private keys.

So I asked myself what kind of assurances do I want regarding key security for HttpClone? It turns out that simply placing the keys in the web server’s /bin directory is probably all that is required. I mean to say if they can modify files on my web server’s bin directory the game is already lost. They can freely change the assemblies, web.config, etc and serve up any malicious content they want.

In the end I chose to allow an added a level of password security to the private key file and then store that password elsewhere. Why? Well you only need to remember back to last year at this time when Microsoft released this annoucement: “Important: ASP.NET Security Vulnerability“. One of the possible gains from this attack was being able to read any file in the web directory (web.config included). Due to this and the potential of a co-hosted site not running ASP.NET being hacked I thought adding the extra layer was worthwhile.

For HttpClone’s purposes it’s still not necessary to further protect the server’s private key. This is due in part to the fact that the data being sent (a copy of a public website) is not private and does not need to be secure. In fact the only reason for involving cryptography at all is for authorization not privacy.

csharptest.net

Separation of Concerns – Edit then Publish

Why is a self-editing website a bad idea?

What are my alternatives?

RE: Keep it secret, keep it safe

Finally, wordpress is no longer running on my web server

Search

Projects

Recent Updates