I Built a PC that CAN’T Fail… and You Can Too!

preview_player
Показать описание

Our servers need to stay up and running throughout the day for us to keep doing what we do. So we built additional servers to get rid of any down time if something were to crash. And we're here to show you that you can do it too.

Purchases made through some store links may provide some compensation to Linus Media Group.

FOLLOW US
---------------------------------------------------  

MUSIC CREDIT
---------------------------------------------------
Intro: Laszlo - Supernova

Outro: Approaching Nirvana - Sugar High

CHAPTERS
---------------------------------------------------
0:00 Intro
1:50 How did the cat teleport?
2:25 The servers
3:27 Installing the CPU
5:30 RAM and networking
7:26 The rack
8:47 Clustering
12:28 Does it work?
14:37 A proper demonstration
16:05 Virtualization magic
17:50 You can do it too
18:40 Outro
Рекомендации по теме
Комментарии
Автор

You know what I love? The business model where the software is free for home enthusiasts but funded by sales of commercial licenses

michallv
Автор

„PC that can‘t fail.“ … „sponsored by Intel“. Gold.

vvmbt
Автор

Idk if Linus or the staff will see this, but I had an idea for a video I thought would be useful (especially for myself). Have you thought about a video on PC maintenance? Idk if it has already been done, but something like "Here's what you should be doing for your PC health every month, 6 months, year, etc." Cleaning fans, reapplying thermal paste, updating BIOS/drivers, checking stressed connections, etc.

qwebb
Автор

I've been running an HA cluster at home for a year or two now. The downside is Proxmox doesn't understand "Hey! We have a power outage and there's only 5 minutes left of the UPS." It really doesn't shut down nicely---the cluster will keep migrating VMs as you try to do clean power-off in the dark. For server-down maintenance, it's fantastic though.

DanielFSmith
Автор

0:01 That chair needs to be replaced before it sheds all over the office. Faux leather gets EVERYWHERE when it starts flaking off and once it starts flaking, it accelerates fast.

privacyvalued
Автор

Can’t believe there isn’t a jump to TechnologyConnections saying “ through the magic of buying two of them”

ellivlum
Автор

A machine that can't fail, sponsored by Intel, who currently have countless CPU's failing in businesses and servers across the planet? Amazing timing.

HarryUK
Автор

This was my senior thesis for university!
I designed a distributed fault coincidence avoidance solution using Proxmox VE with DRBD as VM backing storage.
It bounced the virtual machine across the machine cluster randomly to reduce the odds of a fault coinciding with the critical process.
It technically outperforms VSphere FT (But is not technically a full tolerance solution so it's not necessarily comparable.)

justbubba
Автор

It's funny to watch LTT slowly work through the last 50 years of server & DC innovation, as they grow and run into every issue that originally spawned those innovations to begin with. Eventually they might actually arrive in the present best practices.

autarchprinceps
Автор

The thing about docker containers that can resolve the issue that Jake mentioned:
A: run a Virtual Machine just to host these docker containers and the that VM will migrate around the hosts as needed.
B: run that container in Kubernetes. Then you can configure load balancing, scaling and other cool features.

gatisluck
Автор

"pc that can't fail, courtesy of intel"? lmao, given their recent issues

weird_autumn
Автор

A Video about reliability sponsored by INTEL… huh

nk
Автор

My main takes from this video are:
1. Don't dye/bleach only the back of your head
2. Jake is looking good!

Gabu_
Автор

Three things:

1) I'm already running this at home with three OASLOA Mini PCs (which sports an Intel N95 processor (4-core/4-thread)) in each node, along with 16 GB of RAM, and a 512 NVMe SSD. The system itself has dual GbE NICs, so I was able to use one of them for the clustering backend, and then present the other interface as the front end. (Each node, at the time, was only like $154.)

2) My 3-node Proxmox HA cluster was actually set up in December 2023, specifically with Windows AD DC, DNS, and Pi-Hole in mind, but then ended up changing to AdGuard Home, after getting lots of DNS overlimit warnings/errors.

(Sidebar: I just migrated my Ubuntu VM from one node to another over GbE. It had to move/copy 10.8 GiB of RAM over, so that took the most time. Downtime was in the sub-300ms range. Total time was about 160 seconds.)

3) 100 Gbps isn't *that* expensive anymore. The most expensive part will likely be the switch, if you're using a switch. (There are lower cost switches, in terms of absolute price, but if you can and are willing to spend quite a bit more, you can end up with a much bigger switch where you'd be able to put a LOT more systems on 100 Gbps network vs. getting a cheaper switch, but with fewer number of ports overall. I run a 36-port 100 Gbps Infiniband switch in my basement. I have, I think, either 6 or 7 systems hooked up to it right now, but I can hook up to 29-30 more, if I need to.) On a $/Gbps basis, 100 Gbps ends up being cheaper, overall.

ewenchan
Автор

I just wanted to comment to say that Jake you’re looking good, my dude! You mentioned before about losing weight and it’s clear you’ve lost some more :) keep up the amazing work!

matthewjalovick
Автор

Having spent as much time as I have in my career troubleshooting DRBD issues, getting calls in the middle of the night about dreaded split brain, etc. I would gladly trade some level of performance in order to not use DRBD.

Realistically you should have a cluster for storage and a cluster for compute, and therefore not have to worry about using something like DRBD to keep things in sync. With that being said, it's nice to finally see LMG moving closer to enterprise level infrastructure!

MrPudgyChicken
Автор

A video sponsored by intel about a PC that can't fail with its current reliability issues is so funny. I know these deals are sometimes a long process but wow intel did not gain much out of this particular video's sponsorship.

wierdcreations
Автор

Intel and stability... intresting!

gdguy
Автор

I think yayIHaveAUserName mentioned this in the disussion forum you linked, but unplugging your server and having whatever VMs are on there going down actually means that those VMs have to be rebooted on other servers. So technically...those VMs are failing, they're just being rebooted automatically. This matters if you're running a program that does not save state before it crashes because you might lose all your progress. It might also matter because you could mess up your filesystem if there were important write operations happening at the time of the crash. Very cool technology, but the video title is not 100% achieved, in my opinion.

Also, the clustering section goes really quickly over fault-tolerance (i.e. "quorum"), but I don't feel like it was very well motivated other than just saying having two computers is not safe. Unless I misunderstood, the piece that seems like its missing is that this clustering program seems to be trying to handle Byzantine fault tolerance, where a computer could have a malicious user giving false data, which although is out of scope with your video, is the reason 2 computers with one fault is not safe for knowing what is the current, valid state of the system. Otherwise, why not trust the other computer to have the correct data? Simple redundancy would let you trust the one working computer with the source of truth.

aeroragesys
Автор

From experience
1) that is not enough RAM, I'd recommend at least twice as much. If you have memory heavy workloads, I'd recommend even more.

2) You should really use a dedicated NIC for management and coro sync.

3) If you use network storage, you want jumbo frames and therefore a dedicated network.

4) if you have heavy guest traffic, you should definitely use a dedicated NIC for that as well.

thomastom