Everybody should have RAID and a filesystem to manage it

Topic: 

For many years, I have been using RAID for my home storage. With RAID (and its cousins) everything is stored redundantly so that if any disk drive fails, you don't lose your data, and in fact your system doesn't even go down. This can come at a cost of anywhere from about 25% to 50% of your disk space (but disk is cheap) and it also often increases disk performance. Some years ago I wrote about how disk drives should be sold in form factors designed for easy RAID in every PC, and I still believe that.

RAID comes with a few costs. One of them is that you need to do too much sysadmin to get it working right. The nastiest cost is there are some edge cases where RAID can cause you to lose all your data where you would not have lost it (or all of it) if you had not used RAID. That's bad -- it should never make things worse.

A few years ago I switched to one of the new filesystems which put the RAID-like functionality right into the filesystem, instead of putting that into a layer underneath. I think that's the right thing, and in fact, fear of layer violations is generally a mistake here. I am using BTRFS. Others use ZFS and a few other players. BTRFS is new and so its support for RAID-5 (Which only costs 25-33% of your space and is fast) is too young, so I use its RAID-1, where everything is just written twice onto two different disks. Unlike traditional RAID, BTRFS will do RAID-1 on more than 2 drives, and they don't have to be all of equal size. That's good, though I ran into some problems with the fairly common operation of increasing the size of my storage by replacing my smallest drive with a much larger one.

The long term goal of such systems should be near-trivial sysadmin. The system should handle all drives and partitions thrown at it in a "just works" way. You give it any amount of drives and it figures out the best thing to do, and adapts as you change. You should only need to tell it a few policies, such as how much need you have for reliability and speed and how much space you are willing to pay for it. The systems should never put you at more risk than you ask for, or more risk than you would have had with having just one drive or a set of non-redundant drives. That's hard, but it is a worthwhile goal.

But I think we could do more, and we could do it in a way that we get better and better storage with less sysadmin.

Multiple drives, but not too many

I think most users will probably stick to 2 drives, and rarely go above 3. The reality is that 4 or more is for servers and heavy users, because each drive takes power and generates heat. However, adding an SSD to the mix is always a good idea but it's not for redundancy.

The OS should understand what's happening and reflect it in the filesystem

The truth is not all files need as much redundancy and speed. The OS can know a lot about that and identify:

  • Files that are accessed frequently vs. ones not accessed much, or for a long time
  • Files that are accessed by interactive applications which cause those applications to be IO bound. (ie. slowed by waiting for the disk.)
  • Files that have been backed up in particular ways, and when.

Your OS should start by storing everything redundantly (RAID 1 or 5) until such time as the disk starts getting close to full. When that happens, it should of course alert you it is time to upgrade your drives or add another. But it can also offer another option which ou can explicitly ask for, namely reduce the redundancy on files which are rarely accessed, have not been used for a while, and have been backed up.

It turns out, that's often a lot of the files on a disk. In particular, the thing that uses up most of the disk space for the ordinary user is their collection of photos and videos. Other than the few that get regular access, there is no actual need for RAID level redundancy on these images. If their own drive is lost, there is a backup where you can get them. They aren't needed for regular system operation.

The systems already know what files belong to the OS, and can keep them redundant, though most home users are not looking for 100% uptime, they really only want 100% data safety.

To do this right, programs need to tell the OS why they are accessing files. Your photo organizer possibly scans your photo collection regularly, but this scan doesn't make the files system crucial. My goal is not to have the users designate these things, though that is one option. Ideally the system should figure it out.

The system can also take the most important files, the ones that cause the system to block, and make sure they are both redundantly stored and found on SSD.

Easier backup

Backup needs to be easy and automatic. When systems boot up, they should offer to do backup for others who are nearby and semi-nearby, and then they should trade backup space. My system should offer space to others, and make use of their space for either general backup (if in the same house/company/LAN) and offsite backup (remote but with good bandwidth.) Of course, ISPs and other providers can also provide this space for money.

The key thing is this should happen with almost no setup by the user. One problem for me is that I can come back from a trip with 50gb of new photos, and they would clog my upstream for remote backup. The system should understand what files have priority, and if the backlog gets too much, request I plug in an external USB drive to offer a backup until the backlog can be cleared. Otherwise I should not have to deal with it. Of course, the backup I offer others does not need RAID redundancy. Instead, I should be queried regularly to prove I still have the backups, and if not, the person I am backing up should seek another place.

Encryption

Of course all remote backup must be encrypted by me. In fact, all disks should be encrypted, but too much desire for security can cause risk of losing all your data. Systems must understand the reduced threat model of the ordinary user and make sure keys are backed up in enough places that the chances of losing them are nil, even if it increases the chance that the NSA might get the keys. This is actually pretty hard. The typical "What was your pet's name" pseudo security questions are not strong enough, but going stronger makes it more likely there can be key loss. Proposals such as my friendscrow can work if the system knows your social network. They have the advantage that there is zero UI to escrowing the key, and a lot of work to recover it. This is the ideal model because if there is ZUI on storing it, you are sure it will be stored. Nobody minds extra work if they have lost all the normal paths to getting their key.

Comments

I've learnt, from painful experience, that redundant RAID arrays are for continuous availability. They are still a single point of failure, even with redundant drives.

Fully redundant, physically separated backups are essential can't be wiped by a power surge or controller failure :(

What I point out is that for most of us, RAID is not necessary for data that is rarely used and which is backed up. Like your photo and video archives, which is the main thing that makes people have multi-terabyte drives.

I feel the same way. RAID is for high availability. RAID controllers and extra disks add cost, complexity, and weight that desktop systems seldom need. I have worked in many offices that only back up specific working directories on a workstation, vice the whole machine. In the rare event of a drive (or other part) failure a spare box is swapped in, imaged, and the user is up and running in a few hours. Very few home or office users need the level of reliability or modest performance gain offered by a RAID, especially when weighed against cost and portability.

My experience is that drives do fail, and RAID has saved my bacon a couple of times. But it's also caused me trouble. But no amount of reminding makes people be reliable about their backups (especially offsites.) It has to be automatic. But even nightly automatic means you can lose a day, and most people don't have that.

That's why you want a filesystem (and backup tools) that understand what files are safely backed up and what files need the redundancy and what files need the speed.

In fact, if the way it worked was that not backing up caused you to run out of space (because all files not backed up were stored redundantly) this would probably make it more likely you would back up, because it would be the easiest way to get more space.

RAID is the wrong product/technology for home users. Yes, whatever solution you select needs to be easy to administer, but RAID is certainly not that. RAID is for non-stop operation, it’s still a serious single point of failure for the home user. There are already good RAID solutions if you are technically inclined; ZFS works well in the Linux world (BTRFS is way too flaky currently, but it’s still being developed) but clearly still not adequate for home protection. In the Windows world you have to have Windows Server to get software RAID, though you can use a BIOS controlled RAID controller for hardware RAID in the client version.
For a home environment where there are multiple desktops, laptops and tablets with critical data sprinkled through them you really need at least one physically separate backup copy and two would certainly be better. It’s worth a serious home user having a physical backup in home. For example I use 6TB of media and backup server that does have redundant storage (not RAID). I can also store device images in case a client device fails.
You can also use a backup service (Internet based, such as Carbonite: http://pcsupport.about.com/od/backup/a/online-backup-comparison.htm) to store another encrypted copy of critical data, and incremental backups done automatically this way can be very effective.
So for a home user/family a local storage server (could be NAS or a cheaper older PC) is a great step forward, and with duplication to a cloud based backup seems ideal. For those technically able it’s easy to make the storage server (hopefully with de-duplication to save space) available over the Internet.
In terms of RAID for performance; you are much more likely today to install an SSD (<$100) for your client devices (as few have the capability of multiple drives). Fiddling with RAID for performance is almost out of scope.

So indeed, the very specific architecture of RAID is more for servers with uptime requirements. What is important is that there be backups, and they take no effort to create, and not too much effort to get stuff from. And in addition that there be offsite backups in case of fire etc. So I am calling it RAID when really I want the redundancy.

Though RAID also gives you extra speed, and that's worth it. In addition to what I inspect above, a good multi-disk FS should not only notice data that's frequently accessed and put it on SSD, it should put it on as many drives as possible. For example, with my 3 disk done as RAID 1, the most popular blocks should actually be on all 3 drives, and the least popular blocks on online one drive (but backed up elsewhere of course.) That way when I need the popular bock, there are 3 drives which could go get it for me, or 3 drives could get the next 3 cylinders worth of data in a big read.

"Unlike traditional RAID, BTRFS will do RAID-1 on more than 2 drives, and they don’t have to be all of equal size."

I've used VMS at work for 25 years, and at home for more than 15. Host-based volume shadowing is a RAID-1 implementation. Supports up to 6 drives, which can be of different sizes. (Of course, the size of the virtual volume cannot be larger than the physical size of the smallest disk.) The size of virtual drives can be increased on the fly (no down-time). Members of such a shadow set can be at different geographical locations. Combined with VMS's shared-everything cluster concept, there is no more robust solution.

" In particular, the thing that uses up most of the disk space for the ordinary user is their collection of photos and videos."

Indeed. While I have RAID-1 for all disks, I like separate disks for separate things (system, user, third-party software, data, scratch), even though these days everything would fit on one drive, primarily because backup policies are different.

Of course, VMS is not for the typical home user. It all depends on context. As Ringo said when a reporter asked the Beatles what they thought of the new one-piece swimsuits, "we've been wearing them for years".

I am not sure what you mean when you say "The size of the virtual volume cannot be larger than the physical size of the smallest disk."

In my RAID-1, I have 3 disks of 3TB, 4TB and 6TB. The resulting virtual volume is 6.5TB, which is larger than any of the disks, not just the smallest one. (It is half the size of the total disks which is as large as a RAID-1 could be.) That's the point of more modern RAID. Old RAID used to limit you as you say.

RAID-1 is mirroring: identical copies on more than one disk. With nothing else involved (i.e. some mixture of RAID levels), then the virtual unit cannot be larger than the smallest disk, otherwise not all the data would fit on the smallest disk. My guess is that you don't have pure RAID-1, but rather have the physical disks partitioned then virtual disks created from those then these are mirrored as RAID-1.

While not pure "RAID 1" in its initial layout, the better concept mirrors blocks rather than disks. So on 3 disks (or more) each block is present on 2 disks, and any one disk can fail without loss of data. Failure of 2 disks means loss of most but not all of the data. (Metadata ideally is mirrored on every disk but I don't think BTFS has that though it should.)

That way with 3 disks of different sizes you use all the disk space, as long as one disk is not larger than both smaller ones put together. Much better. Of course you can build a 3 disk triple-mirror style, which gives you tons of redundancy and speed, but of course at a huge cost. In my configuration, that would give me only 3TB of triple-stored data, while I have 6.5TB of double stored data. The latter is a better choice for most use.

They also have raid 5 and 6 implementations in the works. You would need 4 disks to get a space efficient raid-5, again each block split over 3 disks, but not the same 3 disks, so using all the space.

This is a good way to do things (though 4 disks is a lot for a home system.) The reality of the home user is you run out of space and you buy a new disk and the new disk is always bigger than the old disks you have, and you probably replace your smallest disk because you don't really want to have so many disks in a home system for power, noise and size reasons.

So something like I have -- 6TB, 4TB, 3TB becomes very common. The choice between a 6.5TB RAID-1 and a 3TB raid 5 (even adding a 1TB raid-1 and 2TB of unredundant space) is a no brainer.

"While not pure “RAID 1” in its initial layout, the better concept mirrors blocks rather than disks."

OK, that explains it. (By the way, the VMS RAID-1 implementation, host-based volume shadowing, mirrors groups of 127 blocks. There is also a bitmap in memory with a bit for each such group which records whether anything has changed since the last reset. So, if a disk goes offline and comes back, only those groups where something has changed have to be copied, rather than a full copy. OK, this is more important in a multi-site setup rather than for the typical home user.)

An advantage of identical disks is that one can split such a mirror set and have a consistent backup. It also makes it easy to clone a disk, such as a system disk.

I would say that being able to split is hardly worth the cost. And I am not sure cloning from a smarter RAID should take any longer than from a traditional one, though the software has to understand the FS. The modern FSs like ZFS and BTRFS that do RAID this way made what I think is the right decision -- to break through the layers. The layer violation here is good, not bad. The downsides are small and the upside big. With only 2 disks, nothing changes. With 3 disks it's hugely superior. Pretty good with 4, too, though you should be considering 4 disk RAID-5 if you want the most space, or RAID-6 if you want the most reliability and the sizes are similar.

While I do not believe it works that way, I think that 4 disk RAID-1 could be implemented to give the performance and reliability of RAID-10. 4 drives to grab data from at once if done right. Ability to lose a disk and if you wanted to limit it to look a lot more like RAID-10, you could even give it RAID 10's ability to lose 2 disks when they are the right 2 disks. But really if you need to handle losing 2 disks, raid-6 is probably your best bet.

That the 3 or more disk "RAID-1" that BTRFS does is consistent with the typical definition of RAID-1 (all data is mirrored to 2 or more disks) so it is quite reasonable to call it that. Only the older and more basic RAIDs required disks of equal sizes.

More to the point, I now believe it might be the best configuration of RAID for the typical user. If you are only going to have 2 disks, of course, you have no choice but classic RAID constrained to the size of your smaller disk. Most ordinary users, who are not building an array from scratch by ordering multiple identical sized disks, will find themselves with 2 or more disks of different sizes. As soon as they go to 3 disks, smart RAID-1 often should give them the same reliability and greater storage than they would get with RAID-5 on the 3 different sized disks, unless the small and large disk are very similar in size.

3-disk double-mirror also should offer a speed advantage, though I don't know if that's implemented in BTRFS. If you have 3 different blocks to fetch you can have 3 seek arms go out to fetch them and read them in parallel. That's better than RAID5's speed advantage. (A very clever RAID5 could be putting the 3rd disk to use on half of the next job while fetching the 2 blocks to create a desired block -- do any of them actually do that?)

Add new comment