Crowdstruck (Windows Outage) - Computerphile

preview_player
Показать описание
Nearly nine million Windows machines were taken out by the Crowdstrike problem in July 2024, but why was the impact so problematic? Dr Steve Bagley and Dr Mike Pound of the University of Nottingham discuss the problem.

This video was filmed and edited by Sean Riley.

Комментарии
Автор

"Well, well, well. Tell me, young gentlemen, why is it always you two when something bad happened??"

luicecifer
Автор

I got dragged into this and I'm now at 48 hours of overtime. Thanks CrowdStrike.

james_chatman
Автор

So there were 3 seperate failures from Crowdstrike.
1. The kernel Driver didn't have proper input validation
2. The Channel File was broken
3. The testing was so abysmal that they didn't notice before sending the update out to customers.

TheAnonymmynona
Автор

McAfee did something similar several years ago. A bad definition quarantined core system files. The McAfee CTO from that era is now CEO at Crowdstrike.

leighhaynes
Автор

The real worry is the lack of QA at Enterprise companies. A state actor infiltrating one of these orgs would be absolutely devastating.

oourdumb
Автор

Heh the BSOD at 0:40 is cool
"For more information about this issue and possible fixes, do not ask us"

solimmsks
Автор

Nice touch with the 13.37% in the BSOD 😁

satysin
Автор

"If you put everything on the cloud, and then the cloud's not there, you've got nothing."

era_s
Автор

The problem is rolling out an update (that might not have been tested so well) TO EVERYONE ON THE PLANET AT THE SAME TIME. I can't believe Crowdstrike is operating like this. If you did a phased roll-out to a couple of smaller customers initially, and then monitored whether the updates didn't have any glaring issues this whole situation could have been averted.

wcmatthysen
Автор

"As I said online, you should just go outside and enjoy the sunshine."

Okay, but what are people in the U.K. supposed to do?

IstasPumaNevada
Автор

In the modern version of Battlestar Galactica, Admiral Adama absolutely refused to have Galactica networked to other systems and ships in fleet because of the risks to their it critical system. Yet here we are, allowing a root kit to operate unconstrained on millions of machines. Fun times ahead.

vincei
Автор

The frowny face is absolutely necessary

Arthur-
Автор

I swear this is only the beginning for tech companies that are losing valued senior staff over the many, many decades...

LunarcomplexMain
Автор

Perfect storm: no fuzzy testing the driver code, no staged deployment, no os blue/green boot partition

piranniayt
Автор

I was stuck in Atlantas airport because of this. It was absolute madness and everyone that talked about it, either from the airline or passengers, said it was a Microsoft issue. That's all most people are going to remember.

BruceAngus
Автор

I was waiting for this video with extreme excitement for the last 2 days. I jumped on YouTube as soon as I saw the notification.

adityavardhanjain
Автор

If Dr Bagley and Dr Pound had a podcast, I'd definitely listen to them talk for hours lol.

bilalsadiq
Автор

Software running in the kernel pretending to be a driver, when in reality it is a parser, what could go wrong?

wily_rites
Автор

The fix is simple, do not push untested code onto live systems where it will run as part of a must run to boot kernel level driver. Run it on a test system first. And never trust a 'security company' who says you should do otherwise (except in rare cases, such as a very bad zero day being exploited where it's a gamble either way). If they allowed this for a run of the mill non-emergency update then they don't know cyber security and safety well enough to protect a home gaming system, let alone major systems. This goes past gross incompetence to the point where I wouldn't blame anyone from suspecting malice. Though I personally think it was "we don't screw up, we stop screw ups" level hubris.

kaseyboles
Автор

When talking about this incident it's worth remembering that hospitals were affected and she people may have died because of this. So it's all well and good to say when everything goes down, go outside and touch grass. But also, we do need to think seriously about whether we're doing enough to ensure software safety. We take it way less seriously than, for example, car safety. When a new model of car comes out it has to go through all kinds of testing to ensure its safety. But we are doing nothing to ensure software safety, we are just 100% trusting the vendors. I've been a software engineer professionally for 25 years and have long thought that the current approach is madness and incidents like this one only make more sure we need to have standards that all critical system software meets in its development, deployment and implementation.

CheddarKungPao