Slow RAID for the home network

Hard disk drives these days are cheap. Too cheap, in that while we love paying 30 cents/GB, the reliability is getting pretty poor. Doing backups, especially automatic backups is a must, but what about RAID?

One of the problems with RAID, at least RAID-5 is that you need to have 3, and ideally 4 or 5 drives in a machine. That's a lot of drives, a lot of power, a lot of heat, a lot of noise. And many machines only have two IDE controllers so they can barely do 3 drives and can't readily do more even if they had the slots and power for them.

So I propose a software RAID-5, done over a LAN with 3 to 5 drives scattered over several machines on the LAN.

Slow as hell, of course, having to read and write your data out over the LAN even at 100mbits. Gigabit would obviously be better. But what is it we have that's taking up all this disk space -- it's video, music and photos. Things which, if just being played back, don't need to be accessed very fast. If you're not editing video or music, in particular, you can handle having it on a very slow device. (Photos are a bigger issue, as they do sometimes need fast access when building thumbnails etc.)

This could even be done among neighbours over 802.11g, with suitable encryption. In theory.

Not that there aren't some major issues to overcome. The machines must be on most of the time. (A single disk can be taken out of a RAID temporarily, and thus a single machine hosting one disk can be turned off or rebooted, but not for long periods.) If you lose access to two disks (or your LAN) you can't get access to the data. And it's going to use a lot of your network capacity, though gigabit networking is starting to get cheap. And the idea gets better... This is actually more relaible than RAID in a single machine. Common errors that RAID can't fix include fires, bad power supplies that fry all the drives, certain types of power spikes, overheating of the entire case etc. In these cases you lose all the drives in your RAID. The software RAID I propose could get you through these, even a limited fire. Though offsite backup is still your best bet.

You could even RAID over the internet, which is OK for music, though would be pushing it for video.

Of course it's all software. You decide on the size of the RAID block, and everybody allocates a file or partition of that size on their system, or more than one of them if desired. For example, 2 machines might do well with a 4-disk RAID, 2 in each system, though this would not protect you from system loss the way 4 independent systems would. You would use regular, non-RAID (or Raid-1) for the working parts of the filesystem (OS, etc.) and use traditional offsite backup for it. Each computer (which could be running any OS) would hae a driver to talk to the net and make things happen. The system could make a big pool of space or give everybody a personal space equal to 2/3 to 4/5ths of the space they offered. And of course, you must be ready if a drive or system fails, to swap it out and rebuild. Rebuild will suck up the network.

Comments

the stock linux kernel has a driver which exactly such a thing, called network block device. people are using it, plus software raid, in production environments. i've heard of it used more often for raid-0 striping (eg to make petabyte filesystems out of a cluster of PC's for free) than raid-5, but it is capable for the full range of raid types that linux's software raid driver supports. i tested it once with raid-1 just for giggles and the performance was acceptable for home directory service (mp3 streaming was probably my highest bandwidth activity), but only have one "always-on" server so it wasn't practical long term.

Well, I'm always discovering packages for linux I didn't know existed. This is most of the way there, though in the vision articulated it also runs on other OSs (indeed, many people don't have 3 or 4 linux boxes but do have extra windows boxes.) And I had the idea that it could be shared and efficient for all the machines hosting it. Ie. in the example above with 2 machines with 2 disks each, one is the host and the other the client, and if the client wants to write a block, he has to send it to the host, which then sends the distributed blocks back to the client, and likewise for read. Ideally you want a system where you know which blocks are local and which are remote.

Probably need a central server to maintain the actual filesystem cohesion though with a bit of work you could minimize the network traffic on that.

Also, as fully visualized, in a RAID-5 with one disk per machine and 3 or more machines, you want it so that any machine can be taken down. That requires a truly peer based system.

Let's look at this conundrum, which is flawless management of files across computers. The confusion over where data gets stored and fetched from can be resolved owing to remote login. Assuming we speak of high-bandwidth networks here (100Mbit LAN in my case), the 'cost' of using SSH bandwidth is almost negligible. Even 10Mbit connections would give you a very responsive UI behaviour over the network. I know this because I tried it. The exception are cases where you stream many large frames over the network (e.g. video, games).

I have never looked deep into RAID technologies. Why not just scp (or rsync) the entire content of a hard-drive periodically. I maintain files on just a single machine, which I always SSH to from elsewhere. Its contents get mirrored on two other remote machines. Files that change on a daily basis are backed on the SAN overnight using cron jobs. This simplifies life a great deal. I still see no compelling reason for using RAID, especially with the existence high-speed networks.

With RAID-6 you can have TWO disks go down and your data is still accessible. If sharing a networked raid array with my roommates / neighbors, I'd certainly want this.

With RAID6 you can lose two disks and keep your data, but of course the CPU required is greater. That would mean you could run mini-ATX boxes with big disks and still have some chance of keeping your data intact.

You could probably do this with some of the NAS drive cases that I've seen, although it would be easier if you could load the netblock system onto each drive, but since mostly then run linux it might be possible. We may even see it as a standard option...

I think a purpose-designed system will work better than trying to run RAID over the network though - in theory you should be able to satisfy any RAID5 request by sending full data from N-1 drives and a checksum from the last one (at a computational price on the requesting drive) which would lower your newtwork demands.

I've been fortunate enough to not yet have any HD failures (knock on wood), so I don't know this: What's the typical failure mode for the newer, inexpensive drives? Is is really bit-failures, randomly strewn across the device? Block failures? Complete failures?

Do we really need a RAID solution to address it? (Well, RAID does provide an idealised solution.) Or would a continual (incremental) backup, in the background (e.g. from a journalling file system in the form perhaps of a log-structured file system), with "instant, on-demand restore" serve the bill? The latter, of course, offers a time lag on backup, data compression opportunities, and likely a momentary delay on restore.

Because such a system would buffer file churn, i/o performance should theoretically be levelled out over a longer interval as well, making such a solution more feasible over limited-bandwidth pipes. A bit fault can be corrected fairly quickly; a complete HD failure would take some time to recover, but would be recoverable.

For that matter, are there any popular Error-Correcting Codes that could supplement this? E.g. offshoring the ECC while retaining the source data locally, and retrieving (only) the redundancy data to correct the local data when a local error is detected?

Of course, if the nature of the typical failure (and thus the nature of the demand for data restoration) demands more immediate results, then a RAID solution would be the better approach.

Yes, the continual backup would be good, as is raid-1. Raid-5 has the advantage of using less disk. Ie. with 5 200gb drives in raid-5 you get 800gb of storage. With raid-1 or other plain backup, then not counting compression, you get only 500gb.

Of course you do not get the performance hits. I have had a lot of drive failures of late. The drives are cheaper, and the systems are running hotter as well, I suspect.

How fast do you think it could be with separate gigabit NIC to avery node? Every node fast computer, host (server) core 2 duo. 100GBps? More? What about latency? and the worst = is it possible to connect more hosts?

Add new comment