Virtual Machine Image library at EC2

Topic: 

The use of virtual machines is getting very popular in the web hosting world. Particularly exciting to many people is Amazon.com's EC2 -- which means Elastic Compute Cloud. It's a large pool of virtual machines that you can rent by the hour. I know people planning on basing whole companies on this system, because they can build an application that scales up by adding more virtual machines on demand. It's decently priced and a lot cheaper than building it yourself in most cases.

In many ways, something like EC2 would be great for all those web sites which deal with the "slashdot" effect. I hope to see web hosters, servers and web applications just naturally allow scaling through the addition of extra machines. This typically means either some round-robin-DNS, or a master server that does redirects to a pool of servers, or a master cache that processes the data from a pool of servers, or a few other methods. Dealing with persistent state that can't be kept in cookies requires a shared database among all the servers, which may make the database the limiting factor. Rumours suggest Amazon will release an SQL interface to their internal storage system which presumably is highly scalable, solving that problem.

As noted, this would be great for small to medium web sites. They can mostly run on a single server, but if they ever see a giant burst of traffic, for example by being linked to from a highly popular site, they can in minutes bring up extra servers to share the load. I've suggested this approach for the Battlestar Galactica Wiki I've been using -- normally their load is modest, but while the show is on, each week, predictably, they get such a huge load of traffic when the show actually airs that they have to lock the wiki down. They have tried to solve this the old fashioned way -- buying bigger servers -- but that's a waste when they really just need one day a week, 22 weeks a year, of high capacity.

However, I digress. What I really want to talk about is using such systems to get access to all sorts of platforms. As I've noted before, linux is a huge mishmash of platforms. There are many revisions of Ubuntu, Fedora, SuSE, Debian, Gentoo and many others out there. Not just the current release, but all the past releases, in both stable, testing and unstable branches. On top of that there are many versions of the BSD variants. For all these, I would love to see Amazon or some other virtual hosting provider offer pre-prepared virtual machines for all these systems, rentable by the hour. This would allow any developer to say, "I want to try something on a Fedora Core 2 System" and be able to get one quickly and cheaply. Generally this would apply to shell access, as windowing system access, while possible under X, would be a bit slow, but even it is doable. It should also be possible to compile applications and make packages, allowing people who release software to quickly prepare packages for different platforms.

Of course, you can build all these VMs on your own machine using any one of the various virtualizers now available. And it would be nice to have a library of pre-made VM images out there to download, though these tend to be large -- 1 to 2 gb. There are some collections but I have not found any nearly complete one. However frankly it may be a lot faster to use a remote one, pre-built than to download a 2gb image of one.

The remote ones would give you a virgin image and let you customize it. Ideally, as some virtualizers do, you could save out only your differences from the original, or you might need to save out a whole new image. You could delete it when done, or keep it around if it's special just for you. Ideally others could build VMs with extra features, though there are trust issues there.

Of course it would be nice to have the same for Windows and Mac, but copyright problems stop this from happening unless Microsoft or Apple are directly involved. And of course at least Windows does not really do a lot for you with a command line interface, you would need a remote desktop protocol like vnc, or Microsoft's own, to make it useful, and that could reduce the value of this in the MS world.

Building your own VM images is not that hard, but it's work. As much work as building almost any machine -- sometimes more if you are not familiar with the problems of a particular VM.

This would also be handy for people who have old machines around which are too slow or limited in disk space to build new applications on. An old laptop isn't going to be the greatest place to do a lot of compiling, it may not even have the development tools on it. But if you could rent a virtual machine ready to compile for the particular OS distro on the laptop, that could be very handy.

This is particularly true if the supplier were to offer a "distcc" cloud service. Distcc is a cute protocol that lets you share compiling load for a large project over many machines. A rental distcc pool need not even run on virtual machines, making it just a bit more efficient, especially memory wise. One could bring up a virtual machine, load up one's source code, and compile it, using a distcc list of real machines on the same network. In theory that could mean hundreds of servers, meaning a compile of hundreds of files could be done in a few seconds. Here you would want to rent time by the second, not the hour. You could test it on the VM or download it for testing locally as needed.

A Java Virtual Machine cloud might also be interesting, considering how many server apps are being written for JVMs as well.

Add new comment