CrowdStrike Update: Latest News, Lessons Learned from a Retired Microsoft Engineer

preview_player
Показать описание

Follow me for updates!
Twitter: @davepl1968 davepl1968

1. Introduction to the CrowdStrike Falcon IT Outage:
• Overview of the recent CrowdStrike Falcon IT outage and its impact on various industries.
2. Technical Details of the Outage:
• Explanation of the faulty sensor configuration update and how it led to system crashes (BSOD) on Windows systems.
• Specifics about the corrupted “Channel File 291.”
3. Impact and Response:
• Description of the scale of the outage, affecting approximately 8.5 million devices worldwide.
• Steps taken by CrowdStrike to deploy a fix and provide mitigation guidance to affected customers.
4. Previous Issues with Linux Systems:
• Recap of earlier incidents where CrowdStrike updates caused crashes on Debian and Rocky Linux systems.
5. CrowdStrike on macOS:
• Discussion about CrowdStrike’s security solutions for macOS and their use of Apple’s System Extensions.
6. Kernel vs. User Mode in Security Software:
• Analysis of why kernel-mode access is used by CrowdStrike and the associated risks.
• Historical context of kernel vs. user mode in Windows drivers.
7. Regulatory Challenges:
• Narrative on Microsoft’s attempt to introduce an API to prevent such issues and the regulatory hurdles faced from the European Union, which deemed it anticompetitive.
8. Conspiracy Theories and Broader Lessons:
• Overview of conspiracy theories that emerged around the outage.
• Lessons to be learned from the incident, drawing a parallel to the Tylenol crisis management.

I'm long since retired, and any opinions are mine alone; not a spokesperson!
Рекомендации по теме
Комментарии
Автор

"Standing around with their disks in their hand" was such a great quote lol

hotzemusic
Автор

This was a obvious Crowd Strike procedure error. What _should_ have happened:
1. The update is sent to automated testing
2. The tests return with crashes in 2-5 minutes
3. The guy that made all-0 file would be very embarrassed for that mistake.
4. The code that apparently doesn't check the input would have been flagged for improvement
5. Nobody outside the dev company would have heard of it. Just another day in the office, doing Kernel level work responsibly.

christopherg
Автор

My first C course back in school. The prof asked us to write a program that asked for a date d and a number n, and return the date d+n days. Beginner stuff, but you have to start somewhere.
When the first student completed the assignment, the prof went to their computer and when asked for a date, he typed : "did you trust the user input ?"
Program crashed, lesson learned.

XH
Автор

“It choked, turned blue and died“
I love the way you explain these technical items in such a clear and funny way.

patrickhoveling
Автор

Way back in high school my programming teacher taught us that our user interface code had to be bulletproof. A 'bullet' was then defined as a hyperactive ten year old at the keyboard. One of the first tests he would do is to mash both hands on the keyboard. My program was expected to handle that gracefully.
I've been using that philosophy on everything I do ever since.

BFLmouse
Автор

"The code just raw dogs it and hopes for the best"
You almost caused me to spit out my drink

spidalack
Автор

Most telling is that CrowdStrike has already done the same thing to smaller code bases, i.e. those two Linux versions mentioned and has obviously not fixed the internal processes that allowed corrupt updates to be released.

Thank you Dave for getting more of the story.

roycsinclair
Автор

"I try to never attribute to malice that which can be sufficiently explained by incompetence." Greatest statement ever

whatisthis__
Автор

We had our own Tylenol scare in Australia. A woman sent threats to Arnott’s, a biscuit company, saying that she had contaminated a line of biscuits in some states (if I remember correctly), and arnott’s response was to recall every line of biscuits in every state immediately. They also published full page ads of the police reports, and any updates that had come from the police. They instructed people to not eat any of their biscuits. When they found the woman and dealt with her, Arnott’s didn’t lose any market share, in fact they gained market share because of their demonstrated trustworthiness. It has also become a case study in how to deal with a crisis in business.

uzaiyaro
Автор

There is so much value added here. You can really tell that Dave has had to present -- "ok, what had happened was" to a design review before. The bit about standing around with their disks in their hands ... that was priceless sir :D

lindoran
Автор

"Never attribute to malice that which is adequately explained by stupidity."
- Hanlon's razor

mrt_
Автор

Regarding the Linux issues with Crowdstrike: The important difference there is those were problems with the falcon-sensor itself. Those were discovered through the normal server patching process. Since we patch test machines first, we were able to find the problem before it hit any production servers. Having the problem in an automaticly downloaded channel update is a big difference.

ChrisCandreva
Автор

“Their code just kind of raw dogged it” 😂😂

One of many quotable moments in this video. Bravo, you 56 year old gem of a nerd.

Hamiltron_
Автор

This is a very well thought out, presented and useful video!! I recently sold my condo for $400k and i want to invest the money in the stock market. However, it appears the market is at an all-time high. Should I invest elsewhere or wait for a market correction?

floydchusset
Автор

Thanks, Dave, especially the excellent review of the Tylenol crisis. I was working for J&J, teaching Mr. Burke to use his new IBM PC at the time. He was a marvelous leader and manager!

jlbytbx
Автор

My new favorite quote I heard from this situation is "Any sufficiently advanced incompetence is indistinguishable from malice" and I felt that

thomygoldman
Автор

"While I would never dare to question the wisdom of printer designers..." OMG, the SNARK!!! I'm crying! 🤣🤣🤣

aaronriggan
Автор

DevOps guy here. In our organization, code has to pass a successful pull request build with unit tests, a main build with unit tests, and deployment to two environments where the code is tested with other tools. When a bug reaches production, it is the fault of the entire organization, not just one developer. The exception: when some manager with an agenda shortcuts the process and rushes something to prod. Then it's solely their fault.

JamesQMurphy
Автор

Falcon represents a significant burden for developers, acting as corporate security bloatware that penalizes them for basic tasks like compiling code. This results in lost productivity and siphons money from companies without providing a tangible return on investment. Executives often overlook these issues until a major crisis occurs, yet the decision-makers responsible are rarely held accountable. Consulting a financial advisor could help companies evaluate the true cost and financial impact of such security measures on their overall operations and productivity.

Dawnjohnston-c
Автор

I am a tech guy, but not a hard-core coder. I have never watched any videos like these, where the explanation is broken down so well that, in my humble opinion, darn near anyone could understand it. Seriously great videos and great explanations of a complex topic. Truly appreciated. I shared your last video with a lot of people, and will be sharing this one as well. Great job!!

KeyBorg