Wednesday, January 28, 2009

Cloud vs. Distributed

We hear a lot about "cloud computing" these days, with some people saying it's the wave of the future, and others, notably guys like Stallman, saying that it's "worse than stupidity." I've run across two essays warning of the some of the pitfalls of the cloud idea, Khoi Vinh's cautious "A Cloud and a Prayer" and Jason Scott's profanity laden diatribe "Fuck the Cloud".

But cloud is not the only hyped-name idea floating around. You also hear a lot about "distributed" things. In particular, distributed version control systems have rushed on the scene in the forms of Git, Mercurial, et al. Beyond version control, though, you don't hear much about the distributed idea, but I think it's a bigger deal than the cloud.

Distributed systems bridge the gap between the open, adaptable web and closed p2p systems that were popular not too long ago. When it comes to DVCS, there is no server to maintain, everyone maintains their own copy. One can be the canonical branch, but it's functionally no different than any other branch. But distributed can be much more, and even much simpler than that. I think the real power of distributed systems is when you don't even need a webserver any more to have a web service. Obviously you'd need communication with the outside world, but the server may not need to be as all-encompassing as it once was.

An early example of this is Google Gears, and the newly launched beta of offline Gmail and Google Reader. Gears acts as a miniature private webserver whose job is to stay synchronized with the Gmail server, but the application itself runs locally. I think this kind of thing has the potential to be huge. One of the beautiful and powerful aspects of the web is the simplicity and the cross-platform-ness of developing websites. They can be as simple as raw HTML, or they can take advantage of one of a myriad of frameworks to do incredibly powerful things. But the weakness is that you always have to rely on a server. But what if you could host a website from your own desktop? These days you can, but you couldn't serve much traffic because of bandwidth issues. But what if the users downloaded the Gears app for your site and instead of requesting a page, you request just the data they need? This seems like the natural evolution of AJAX.

Another aspect of distributed systems is that you may only need a server to update changes, but at all other times simply use a local copy. Take, for example, the new Python documentation written in a framework called Sphinx. Sphinx takes raw source files written in reStructured Text and builds them into an HTML website or a LaTeX file for publication. Mostly it's focused on the HTML, and it produces a very nice result. So, you can download all the Python documentation, build it, and then use it locally just as if it was on the web. You can even search it easily, because at build time it indexes the document and creates a Javascript search tool. It behaves exactly the same when opened locally as it does when you view it over the web.

So, imagine that this kind of thing took off in other types of site. A news site could use this idea. You simply download the articles instead of the site itself. But how is that any different than an RSS feed? In some ways, not much, but what it allows the site publisher to keep control of the look and feel of the site, rather than give it up entirely to a separate reader app. For example, the NY Times website could look exactly the same, but when you hit refresh, instead of loading the new homepage, it updates the articles and the layout order of the page already stored locally. Or Wikipedia could become like this. You could download all of Wikipedia and run the server locally. In the background, it updates itself with new changes and articles. I wonder how much space it would take to download all of Wikipedia, or at least all of the text? Probably a lot, but who cares? Storage space is cheap and only getting cheaper. Serving it locally gains you several things 1) speed, 2) the ability to break away from the network and still be functional, and 3) it reduces significantly the burden on the hoster.

These kinds of distributed systems seem more empowering than the cloud, which is all about making your fast and powerful computers no more than dumb terminals, and leaving you at the mercy of the provider of the service. True distributed systems give you the data and the application, meaning you're not beholden to anyone to use it locally. If Wikipedia were to vanish tomorrow, you'd still have your last snapshot of it, rather than being left wondering where to turn to.

Tuesday, January 20, 2009

Inaugural Hats?

Was it just me, or were there a lot of people rocking sweet hats at the swearing in? In addition to an unusual number of cowboy hats, it seemed like there were more fedoras than I had seen since the last Humphrey Bogart impersonators convention.

A New Dawn

As E.K. Franks used to say, "It's a great day to be alive!"