Hardware Raid is Dead and is a Bad Idea in 2022

preview_player
Показать описание
Hot take? Maybe? Maybe not? idk I'm just the editor. ~Editor Autumn

**********************************
Check us out online at the following places:

-------------------------------------------------------------------------------------------------------------
Music:
Intro: "Follow Her"
Other Music:
"Lively" by Zeeky Beats
"Six Season" and "Vital Whales" by Unicorn Heads
Outro: "Earth Bound" by Slynk
Рекомендации по теме
Комментарии
Автор

In 40 years of IT, I have experienced two occasions where the RAID controller card went mad and wiped the data. I stopped relying on RAID but backing up bejesusbytes and having a way to restore it without taking days is the real problem. I've recovered from a hardware disaster but the business was out of action for days. The cloud has the advantage that you can blame somebody else.

__teles__
Автор

Great topic and long live ZFS! I have been a bit curious about the claims made GRAID and this video really helped me understand it better, and of course better understand its shortcomings compared to ZFS & BTRFS.

LAWRENCESYSTEMS
Автор

Took me months of self-researching and testing to get my head around pitfalls you explained so clearly in a 20 minute video. Bravo. Most people think that their raid5-6 array is safe, until it isn't and data recovery fails. Silent bitrot, scheduled parity checks, recovery, rebuild performance... as an individual I ended up between zfs (on linux) and btrfs. Now I am on a long, very long journey to covert all that mess into a giant ceph deployment which is a different level of headache but it seems to be a solution for availability, correctness, consistency and performance (in that order)

totojejedinecnynick
Автор

Before even watching the video, I'm guessing it's gonna be Wendell telling me to use ZFS

DrathVader
Автор

"ZFS has Paranoid levels of Paranoia." "My kind of file system." Every time I learn more of about the file system my TrueNAS box runs the more boxes it ticks off and the more I like it.

WillCMAG
Автор

Linus: Holy sh!t this implementation from Nvidia is FAST. Wendell: This implementation doesn’t prevent data corruption.
;)

NickF
Автор

I appreciate the detailed explanations in this video! I am working on a storage solution for a small office and had already decided to go with ZFS, and this solidified my reasoning in doing so. It's been a LONG while since I have set up a RAID controller, and as it appears we are moving backwards in functionality (for most solutions). Thanks, Wendell!

Techieguy
Автор

@Level1Techs : this is a very insightful video Wendell, I wish I could give you 2 thumbs up but youtube only allows me to give one.

Yves_Cools
Автор

Those of us who lived in the enterprise with our massive SCSi, then SAS direct attached, then finally FC SANs, iSCSI, etc..the most fear inducing moment was always when you lost power, lost a drive, and started what, at the time would have been an offline rebuild, hoping you wouldn't lose more drives in the process. One time is all it takes to make you shudder as you are hoping the backup tapes actually have your data! I just decommissioned an LTO6 library! One year of keeping it around for retention. My backups are all on Synology RS units. No more hand carrying tapes offsite either. Hyper backup works well, where various other schemes didn't. I now have backups of backups for two locations, and BTRFS goodness. All because a long time ago I learned that the D in RAID actually stood for dependable, and the RAI was for Really AIn't. Neither are tapes unless you like rolling the dice on year old magnetic bits. Don't ever trust your data to be there, and make sure you have multiple, verifiable backups and actuality test recovery once in a while. Because it may suck to be down for a little while, but if you lost months of data, we call that a resume generating event or RGE in the computer janitor business. And you don't want to be that person.

Gryfang
Автор

This is exactly the kind of stuff *_everyone_* should be learning about in basic computer courses!
I mean... not even (some) enterprise solutions are really functionally enterprise grade anymore???
Do we really live in such a fast paced world that the only ones concerned with data integrity are massive data collection maniacs like Google?

Arexodius
Автор

Meanwhile in France, 90% of companies use RAID 5 over 5/6 disks if not more and bosses answer: it cost less money and we have backups anyway. Or even, as i also heard: i never saw issues yet with raid 5 or 6. Beceause our country is from stone age and want to spend just enough for something to work even if it isnt really reliable

kevinlassure
Автор

Fascinating how much verifying the integrity of the actual data on the devices is an afterthought. So much attention is paid to RAID's capability to allow an array to survive the loss of a device that we almost completely forget about consistency.

blivioniox
Автор

Great video Wendell, however as a Btrfs advocate myself, I think it's important to represent it properly so people know the difference. I personally believe Btrfs and ZFS, while both have a lot in common, serve two very different use cases. ZFS is the enterprise solution, Btrfs is the home owner/NAS solution filesystem.

First, Btrfs does not support per file or folder level redundancy options. This idea may have been tossed around, but Btrfs redundancy is only on a per volume basis. Not even on a vdev basis, which ZFS has, it has no concepts of vdevs. Not sure I'd personally want per file redundancy option anyway, that could become a management nightmare, but per subvolume redundancy options would be nice. Alas, neither is a thing yet.

As for RAID5/6, consider it unusable on Btrfs. Like just straight up don't use it, the mkfs and balance tool will warn users for a reason if they do try to use it.

It needs a total rewrite to be usable (which means an on disk format change is likely. I believe Western Digital was working on some fixes but this is still a ways out). While btrfs will indeed "protect" you from bitrot, all it can do is prevent returning bad data, it is unable to repair all issues. As such it's nowhere near trustworthy as ZFS RAID-Z.

It's not resilient to the write hole issue at all, and what makes matters worse, it will lie to you which device may be throwing corruption at you. To me, that's as good as not having device stats at all, as you're forced to rely on the disk to actually indicate issues yet again. Apart from the fact the fs won't return bad data, it still forces you to go to a backup or play wack a mole in figuring out which disk is causing the issue. If the parity bits themselves becomes corrupt, scrub won't care to fix it, only if the data itself is corrupt.

Additionally, if data does go corrupt, and the stripe that data was in is updated, when Btrfs updates the parity bits for the stripe, the corrupt data will now be reflected in the updated parity, destroying your only chance at recovery.

Now to be clear, Btrfs' other RAID levels are indeed fine and great at preventing, identifying and fixing bitrot in all cases, but it does require some knowledge about management, as Btrfs will not automatically rebuild without you triggering it.

First, you need to be familiar with how it allocates data to know when a balance is needed, vs a scrub.

A scrub will indeed scrub the filesystem, and repair data with redundancy, but data is allocated on a "chunk" basis (usually 1GiB in size on most volumes), and these chunks are what have a redundancy policy (not really the volume overall). It is this people need to look out for.

First, If the array becomes degraded, btrfs will not automount without user intervention because any new writes to the fs may not be redundant (It's impossible if you have the minimum disks required for a profile, as it's degraded, otherwise it won't be balanced properly). If the allocator is unable to allocate a chunk to store the data in, in a redundant fashion, it will instead create a chunk with the "single" profile. If the filesystem was already mounted, and it can't satisfy the allocator requirements to create a redundant chunk, it's forced read only (this is why people often suggest RAID1 users on Btrfs use 3 disks instead of 2, otherwise high uptime is impossible with it).

Now, even after a drive is replaced on Btrfs after it has been mounted with the degraded, you need to keep an eye on the chunks as they're indicated in it's management tools to see if you need a convert any non-redundant chunks with a balance to restore full redundancy.

However, Btrfs it is quite innovative in other ways, so home users in particular shouldn't write it off too quickly. It allows mixed size disks to be used redundantly. If you have 2x2TB + 1x4TB disk, in a RAID1 configuration, it will have 4TB usable space. You can still only lose one disk without losing the filesystem, but the data on the 4TB disk is balanced between the 2x2TB disks. It also supports duplication on single disks for some added bitrot protection in those use cases, along with RAID10 (and RAID0 if you don't want redundancy, I guess). To get mirroring across three, you would need to use the RAID1c3 profile, which would make 3 redundant copies across the three disks, at which point you'd only have 2TB usable space in the above example, but with the resilience of being able to lose two disks. There's also a raid1c4 option if one wants it. Finally, shadow copy is possible with it, and Btrfs is a great open source alternative to what Unraid provides when it comes to flexibility with disks.

Btrfs also supports converting from any RAID profile on the fly as you like. Wanna go from RAID1 to RAID10? Easy, just add the disks and rebalance to raid10. If you have a RAID10 now, and a disk fails, and you're unable to replace it right away, you can rebalance to a RAID1 to get redundancy back without needing a replacement disk right away. All this stuff can be useful in some cases, and if RAID5/6 ever does get reworked, anyone running it's current stable RAID profiles can easily convert to RAID5/6 later. Now, as cool as this functionality is, it's a bit of a niche, it's more a thing home operators would usually be concerned with, not enterprise users who would just install another disk or configure another vdev.

It is overall a great choice for homeowner/NAS uses cases, and it's built into Linux in just about any distro. I use a 5 disk RAID-10 array with it here and it has been serving me well. It's also a spectacular desktop filesystem for those who run desktop linux -- I wouldn't choose any other, as snapshots, compression, and incremental backups with send/receive is too much to pass up. There's a reason distros like Fedora and OpenSUSE use it by default on their desktop flavors, it really is great, but I just wanted to clear up people's expectations with it so they pick the right tool for the job ;)

BTW: For anyone who cares to have a true Linux alternative to ZFS, keep an eye on Bcachefs ;) ... sorry for the long winded post, I could make a series of videos discussing Btrfs because it is my personal favorite Linux filesystem lol

jsebean
Автор

Would love to have a video with the topic "Welcome to ZFS and BTRFS - Your Wendell introduction and video guide". Basically helping us migrating from ntfs/ext/etc to these other two.

solidreactor
Автор

B-b-but what if you only have a Raspberry Pi? Hardware RAID is like 10x faster if you need parity on a Pi.

JeffGeerling
Автор

I’m still learning and most of this went over my head but this was a great video! I watch the GRAID LTT video and was pretty stoked about something like that. But big sad. Wendell you are my IT janitor hero!!

Foiliagegaming
Автор

Wow, that is like a crash course on RAID. Since I really am an amateur about it, my takeaway is that we could forget the hardware raid controllers and move to a file system based raid platform called ZFS. Question is (provided my understanding is correct), do you have any more details on ZFS and how to implement it? Remember, I am a newbie regarding RAID.

Great video by the way. I'll be following more. Thanks.

davidmeissner
Автор

Once again we get bitten by marketing playing fast and loose with implying that their products solve problems for us that they don't. The storage media space seems absolutely crammed full of this type of thing the past few years. WD hiding SMR in their NAS drive series, NAS units themselves being open to cloud vulnerabilities even when you try to turn off all their stupid cloud extras, SSD manufacturers substituting different controllers, drams and flash chips on their shipping drives compared to what they sent out to get good reviews (always downgrades) and reliability numbers that I wouldn't trust let alone rely on. Now I learn that some of the RAID volumes I've set up to prevent decay of precious files and memories are probably not protected at all against decay or loss.

RN
Автор

And now a word from our sponsor: RAID Shadow Legends

miff
Автор

My how things have changed. I remember not long ago that software raid was constantly shit on.

JohnClark-ttbl