when a null pointer dereference breaks the internet lol

preview_player
Показать описание
but it may not be the devs fault.

Don't know why you'd want to follow me on other socials. I don't even post. But here you go.
Рекомендации по теме
Комментарии
Автор

UPDATE: New info reveals it was a logic flaw in Channel File 291 that controls named pipe execution, not a null pointer dereference like many of us thought (although the stack trace indicates it was a null pointer issue, so Crowdstrike could be covering). Devs fault 100% (in addition to having systems in place that allow this sort of thing). Updates to Channel Files like these happen multiple times a day.

fknight
Автор

Fun fact the null point reference were also on the linux crowdstrike, the linux kernel just handled it like a boss

vilian
Автор

“You cannot hack into a brick”
-Crowdstrike, 2024

capn_shawn
Автор

But it's still their fault for pushing it out to everything everywhere all at once.

whickervision
Автор

Crowdstrike is DEFINITELY still at fault. You never ever ever push an update out live to millions of computers without extensive testing and staged rollouts, especially when that update involves code that runs at the kernel level!

bluegizmo
Автор

I still don’t understand how this patch didn’t brick the machines they tested it on, the idea that a company worth $70 billion didn’t catch this in CI or QA is mind blowing

stevezelaznik
Автор

Saying the root cause was a "null pointer dereference" is like saying the problem with driving into a telephone pole is that "there was a telephone pole in the way." The root cause was sending an update file that was all null bytes. The fact that the operating system executed that file and reported a null pointer dereference as a result is not the fault of the OS, and is not a root cause.

mitchbayersdorfer
Автор

Tired of hearing like Y2K was some panic or something that just magically fixed itself or wasn't a big deal. It wasn't a big deal because people spent years before fixing it

plaidchuck
Автор

giving any software unlimited kernel access is just crazy to me

samucabitim
Автор

This is precisely why you actually test the package that is being deployed. If you move release files around, you need to ensure that the checksums of those files match.

rickl
Автор

Let me guess. Maybe Crowdstrike recently laid off a stack of experienced developers who knew what they were doing, but were expensive, and kept the not so experienced developers who didn't know what they were doing, but were cheaper.

Then on top of that because of the reduced head count, but same workload, then under pressure the developers cut corners to rush product out.

I'm not saying that is what happened. But I have seen that happen elsewhere, and I'm sure people can come up with loads of examples from their own experiences.

JohnSmall
Автор

Has the name says - crowd strike, every device goes to strike

adwaithbinoy
Автор

Thank you for your insights. Man, I hope CrowdStrike does a thorough post-mortem for this one. That's the least they're owing the IT professionals at this point.

Bregylais
Автор

It did not break "the internet". It broke a lot of companies' office computers, but those are not on the internet. In fact, the internet chugged along just fine.

bart
Автор

I have no doubt they will do a thorough investigation as this was such a massive impact with millions and billions of dollars of implications.

coltenkrauter
Автор

At the most fundamental level, it is obvious that CrowdStrike never tested the actual deployment package. Things can go wrong at any stage in the build pipeline, so you ALWAYS test the actual deployment package before deploying it. This is kindergarten-level software deployment management. No sane and vaguely competent engineer would voluntarily omit this step. No sane and vaguely competent manager would order engineers to omit this step. Yet the step was definitely omitted. I hope we get an honest explanation of how and why this happened.

Of course, then you get into the question of why they didn't do incremental deployments, which are another ultra-basic deployment best oractice. I am beginning to form a mental image of the engineering culture at CrowdStrike, and it's not pretty.

isomeme
Автор

I have not written operating system code, but generally code is supposed to validate data before operating on it. In my opinion, developers are very likely the cause. Even if there is bad, the developers should write code that can handle that gracefully.

Also, this video asserted that this kind of issue could slip by the test servers. That sounds ridiculous to me. The test servers should fully simulate real world scenarios when dealing with this kind of security software. They should run driver updates against multiple versions of windows with simulated realistic data.

But, I would be surprised if a single developer was at fault. Because there should be many other developers reviewing all of the code. I would expect an entire developer team to be at fault.

It'll be interesting to learn more.

coltenkrauter
Автор

The internet was not broken. Not sure why people kept saying it did

dondekeeper
Автор

Doesn’t macOS fail gracefully when a kext misbehaves? If so, you can still technically blame Windows for not handling that situation well

bob_kazamakis
Автор

First rule of patch management is you dont install patches as soon as they are available.

If I know that then why some of these massive companies don't is beyond me. It seems that IT management has forgetten the fundamentals.

Also technically it can be done remotely if it's a virtual machine or remote management is enabled.

originalbadboy