Software's HUGE Impact On The World | Crowdstrike Global IT Outage

preview_player
Показать описание
Today a security upgrade to a widely used CyberSecurity product from a specialist company called CrowdStrike resulted in systems running on Microsoft systems failing and so prevent services of all kinds around the world from being delivered

-

⭐ PATREON:

-

👕 T-SHIRTS:

A fan of the T-shirts I wear in my videos? Grab your own, at reduced prices EXCLUSIVE TO CONTINUOUS DELIVERY FOLLOWERS! Get money off the already reasonably priced t-shirts!

🚨 DON'T FORGET TO USE THIS DISCOUNT CODE: ContinuousDelivery

-

BOOKS:

and NOW as an AUDIOBOOK available on iTunes, Amazon and Audible.

📖 "Continuous Delivery Pipelines" by Dave Farley

NOTE: If you click on one of the Amazon Affiliate links and buy the book, Continuous Delivery Ltd. will get a small fee for the recommendation with NO increase in cost to you.

-

CHANNEL SPONSORS:

#softwareengineer #developer #crowdstrike #microsoft
Рекомендации по теме
Комментарии
Автор

5:00 They did revert the patch very quickly. But the affected machine are stuck in a blue screen and need manual intervention.

farquoi
Автор

Poorly tested security software is just as dangerous as the problems it purports to solve!

kermitzforg
Автор

I think that the Post Incident Review for this will be the most anticipated PIR in technology ever!

DeagleNZ
Автор

CEO of Crowdstrike was CTO when the big McAffee outage happened in 2010. You'd have thought he had more incentive than anyone to insist on good engineering practice.

PaulHewsonPhD
Автор

The craziest thing for me is on the client side. It means that all these big company just update all their software without doing any check as to what is coming into their system. We are talking about highly critical systems like banks, airports, hospitals ..

RenaudRwemalika
Автор

My first thought when I heard about it was 'haven't they heard of staged rollouts?' (canary releasing). Its always been funny to me that I tend to expect large companies to know all the best practices and follow them, while simultaneously, I know full well that every large company I have personally worked for or with has NOT followed best practices.

TimothyWhiteheadzm
Автор

The most baffling thing about all this is why they released the update globally at the same time.

Is this how they've always rolled out their updates in the past? If so, how has it not raised any red flags for their customers?

Smart people have already figured out how to avoid disasters like this. None of this is new! It's infuriating that CrowdStrike even managed to mess this up despite the wealth of knowledge available to us now.

aratilishehe
Автор

you missed the most important one .. what the hell were they thinking doing this on a friday?!

PlanetJeroen
Автор

The more these incidents come to light the more I see idiocracy becoming the norm in all aspects of society.

Ian_Carolan
Автор

2:46 It is a security incident. Availability is one of the 3 pillars of security. The others are confidentiality and integrity.

AxWarhawk
Автор

But what is also baffling, from what I have heard, is that CS sits at the kernel level… so how CS and system admins are permitting automatic push updates is beyond me.

llucos
Автор

It would be inconceivable that CrowdStrike doesn't at least have some form of canary testing capability. This seems to be a procedure problem; no new changes--EVEN THE FIXES--should be allowed to go wide without having first successfully past early stages of testing. This should be built into any and every platform.

yapdog
Автор

4:50 "Why didn't they roll back the change?"

Apparently it was a kernel-level piece of software that caused machines to get stuck in a boot loop. As a result, it is impossible to fix an affected machine remotely. IT guys worldwide are having to physically access each machine to remove the offending patch.

ToadalChaos
Автор

There is another question businesses will ask themselves... Can we trust a cloud-based solution what do we have to do to maintain reliability, or resiliency should this happen again. And one more final question, I do believer this could have been an honest mistake, but what if next time it's intentional, i.e. a cyber strike or a bunch of hackers? I have a friend that works in an auto parts store on the East Coast and he said that they couldn't open the stores or make sales. This seems to me to be a house of cards we've built our technology on.

raymitchell
Автор

They were milking interns around to save money now they are finding out.

brownlearner
Автор

their idea of canary releases was prob based on timezones. jokes aside i do hope they make it very transparent in an open postmortem so we all can learn from it.

animanaut
Автор

Crowdstrike Falcon is an end-point security system. It monitors the system operation for malicious activities and has hooks into the OS kernel to enable it to do this. It was likely one of these kernel hooks that had the problem, causing the Windows OS to crash with a blue screen. With the OS dead, it's very difficult to back out the update. This illustrates the dangers of automated-updates that are a key part to any continuous delivery system. Obviously this problem should have been picked up in testing or some sort of "canary" update process, but it wasn't. It unfortunately killed the OS making any sort of automated recovery impossible. Given the damage it has caused, it will be interesting to see if Crowdstrike has to pay for it. That would certainly incentivise software vendors to strengthen their testing and update processes.

kevinmcnamee
Автор

Been watching for about 4 years and I have to say, your t shirt collection never disappoints 😂

akeenlewis
Автор

5:00 Because the update resulted in a BSOD at boot, at which point no further update mechanism could fix the issue and manual administration is required. Some machines may have a low-level management interface (like iLo) which could be used, but that's not available on most of the affected machines.

AxWarhawk
Автор

As a former systems developer, I find it absolutely unthinkable that you did not test a cold boot -- my idea is that they may have run in debug mode with all sorts of profiler crap and test crap, attached to the kernel process. And that just worked. But testing with a debug build in runtime environment with test crap, is not the same as running a release build on a cold clean system.

CallousCoder