CrowdStrike Exposes a Fundamental Problem in Software

Показать описание

Whew, what a disaster! I share my thoughts about the whole CrowdStrike situation and the fundamental problem that I think lies at the core of this. Let me know what you think about this in the comments.

🔖 Chapters:
0:00 Intro
0:11 Who is CrowdStrike?
0:45 Recap of Friday outage
1:11 Rant time
2:03 How it could have been avoided
2:53 A fundamental dichotomy
3:47 Things will get worse

#arjancodes #softwaredesign #python

Рекомендации по теме

Комментарии

Your kernel is crashed. No malicious code can be executed. Your computer is completely protected now. Thank you for choosing our company!

MysticCoder

I think the even more fundamental problem here is the security software mono-culture. I know CrowdStrike is big, but honestly, I was surprised when I heard in the news how broadly sweeping the impact was across companies and even across industries. If everyone's using the same software, that provides a ripe attack vector for hackers. 😒

ropro

It's partly about $$$ and partly about how everything nowadays is expected to happen with speed. Back in the day (30 years ago) I worked for a bank. We maintained a very large enquiries counter system. Before anything got pushed out to branches, it was tested for weeks. We had dozens of test engineers and they would run through every conceivable action. Then and only then a release would happen to a local branch. This would be tested in the wild for a week. Then a small group of branches for two weeks, then a larger group, then finally the main group. The result was that very few (if any) show stoppers made it to production. This meant a slow cadence of releases though. Also this was a large project with extensive management backing, so the cost was not really a factor (within reason).

This type of behaviour would never fly today. Everything has to be done on the cheap, with minimal testing, just "get it out there". I call it the "just get it f**king done" attitude - this is very common nowadays, especially among MSPs.

kwas

So we have the CrowdStrike option ENABLED so CrowdStrike won't release the latest version of their software to use (we stay 1 version behind) - apparently they don't actually even check for this so we got it anyway. Absolutely shoddy development :(

AMMullan

People seem to be overlooking the glaring fact that they pushed an update that was corrupted or checksum failed which means there was a wide open vulnerability that allowed man in the middle exploits or injecting code with modified files directly into the Kernel….

whatcouldgowrong

The CEO of CrowdStrike, George Kurtz used to be the Chief Technology Officer of McAfee in 2010, when a security update from the antivirus firm crashed tens of thousands of computers.

ying-ymut

My career has been leading engineering organizations. This is not a new issue or a unique issue. Bad driver code crashes systems. Because of that, the industry has created well known and effective ways to prevent problems. You've listed them.

The issue here is a company with wide spread driver releases that failed to follow those practices. The free market has created a process for handling that and it is called competition and consumer choice.

James-hbqu

The Crowdstrike disaster hasn't struck because they needed to move fast, but because they obviously haven't tested this specific update on a single Windows machine. Because if they did, they'd immediately noticed it would crash. And they made a similar mistake already in April. That time it could be somewhat forgiven because it only occurred on two distributions of Linux which hadn't been in their test matrix.

on_wheels_

Most surprising is that PCs still don't use A/B installs of the OS, where you use one copy and update the other copy, then switch over to the updated copy, and you can switch back if the update failed for some reason. With disk space so cheep, you'd thing every Linux/Mac/Windows PC would use that by now. In Linux at least you can revert to a prior Kernel version.

ChristianSteimel

But you can have it both ways: it's called rolling updates. You don't deploy software to a billion endpoints in one go.

metamadbooks

Something that needs to be more highlighted from this issue is that companies have in recent years been offloading their IT resources but are still adopting external, overseas-managed (i.e. managed in the US) solutions. Companies should always have an in-house team ready to respond to system failures. Informed, careful companies would only have had a couple of hours of downtime...

lumeronswift

A mechanism rolling back an update after X number of failed boots/etc would help a lot here. My router does this, it keeps a copy of the old firmware it can automatically revert to in case flashing a new firmware image bricks it.

SUSE's MicroOS does similar by having a stateless OS and transactional updates that are snapshotted in the BTRFS file system. If it crashes and reboots, it'll automatically rollback to the snapshot before the update while preserving user data.

askii

Humans tend to think they can sacrifice quality for speed, which works for some time and then fails miserably. It's a bit like the uncertainty principle, there is a fundamental limit that cannot be cheated.

wernerlippert

I want to add that the fact that CrowdStrike is so widely used makes it a target for bad actors, and perhaps how it operates internally, which seems to be monolithic, is also a problem. We also do not know what government and military systems were affected by this "bug" . Regardless of other bad practices that were at play, CrowdStrike itself may want to consider a lessre and perhaps break up its platforms into shards, such that entire industries are not a impacted by one bad software update or a bad pod

_SR_

This is a reminder of how fragile our IT solutions are. Imagine a solar storm occurring and the devastation it would cause! We need a plan B for critical infrastructures to always be in place!

samarbid

So, basically Crowdstrike could not even secure itself against itself. Well done Crowdstrike, well done! (Slowly clapping) To Microsoft, get rid of Crowdstrike, no IFS and no BUTTS!

keithnsearle

I find it utterly incredible that they don’t test the update on a sandboxed system before sending it out.

MadeleineTakam

My rage at everyone downplaying this for CrowdStrike is immeasurable. This is a billion dollar company, with a B, trusted by critical government, public, and private services and they shafted each and everyone. The lack of outrage from our authorities is absolutely disgusting. Speaks a lot to the state of cybersecurity and tech in general

ProfessionalBirdWatcher

This was an embarrassing failure for Crowdstrike. All they had to do was test their patch on Windows PCs prior to release, and they would have seen those PCs blue screen. They could have fixed the issue, tested again, and THEN deployed. The more devices you’re responsible for, the greater the duty to test prior to deployment. This was negligence, pure and simple, and there should be a class action suit against Crowdstrike for the damages they caused. Such a suit would destroy Crowdstrike, of course, but that’s as it should be. Our world needs to deter this negligence in the future.

mitchellsmith

The issue essentially is that there is a kernel-mode driver - no doubt WHQL certified - that is running uncertified p-code from installable 'definition' files, so that a bug there will cause the kernel-mode driver to execute bad code, and bug-check the system. Perhaps the kernel-mode driver needs better checking and self-defence - could the WHQL certification process require this?. The 'fix' is to gain access to safe-mode, boot without the driver, and then remove the installable definition files, so perhaps a system should identify crashing 'boot-required' drivers and sideline them if they crash repeatedly.

yogibarista

CrowdStrike Exposes a Fundamental Problem in Software

CrowdStrike Exposes a Fundamental Problem in Software

The CrowdStrike Outage: Explained

CrowdStrike CEO: ‘We know what the issue is’ and are resolving it

Systemic risk exposed: the Crowdstrike global outage

IT WAS A REGEX?!? - Full CrowdStrike Report Released

Elon Musk fires employees in twitter meeting DUB

How the CrowdStrike-Microsoft global tech outage unfolded

Stop Killing Games

Global tech outage: Microsoft VP explains what went wrong

Thor Pirate Software's View on Star Citizen's Exploitative Consumer Practices

Hacking Exposed: Day of Destruction

Why You Need 100 Crowdstrike Shares Today! (After the Dip)

8 Tips to Prevent Your Software from Becoming the Next Big Disaster

NEVER buy from the Dark Web.. #shorts

Watch This Russian Hacker Break Into Our Computer In Minutes | CNBC

Hacking Exposed Live - TOR... All the Things

'BlackRock Is Showing Us EXACTLY Why Wall Street Is About to Flip On Bitcoin' | Whitney We...

Hidden Dangers in Your Devices: Exposing xIoT's Dark Side with John Vecchi, Phosphorous CMO

JOSH BROWN says 'The Problem with MORGAN STANLEY is they DON'T KNOW WHAT TIME IT IS'

CSS2016D2S21:Hacking Exposed - Crowdstrike

Windows Secure Boot Compromised! What You Need to Know by a Retired Microsoft Engineer

CrowdStrike Outage Exposes Cybersecurity Flaws #215

Are we too dependent on Microsoft? | About That

George Kurtz CEO/Co-founder, CrowdStrike