How Bad Leap Day Math Took Down Microsoft

preview_player
Показать описание
A look into one of the largest leap day bugs in history, as well as how Microsoft Azure's compute platform works (well, worked - it has been over 10 years. Though the fundamentals likely remain the same).

Sources:

Chapters:
0:00 Intro
0:43 Cloud Stuff
3:41 Azure VM Stuff
5:10 The Incident
7:59 Mitigation
10:23 Aftermath

Corrections:
-

Music:
- Ubiquitous by Diamond Ortiz
- Jane Street by Track Tribe
- Funk Game Loop by Kevin Macleod
Рекомендации по теме
Комментарии
Автор

"instructed to do it 3 times cause 2 isn't enought and 4 is too many" I am dying at this

hadipawar
Автор

"Double it and give it to the next person" - Automatic Service Healing, on bugs

abebuckingham
Автор

Azure AD, also known as Microsoft Identity, also known as Entra ID, also known as sadness

MasonBitByte
Автор

A libaba cloud, oracle cloud, IBM cloud, google cloud. Its golden

MidnightMidas
Автор

Love that the datacenter in Australia was upside down. Nice touch

Lars
Автор

It's a good day when Kevin Fang uploads.

ShadowSlayer
Автор

10:20 I love that you rolled the date over to 2/30.

InspectorGadget
Автор

Imagine waiting 75 minutes for a VM initialisation. Why would it take 25 minutes? Is there an intern hand-delivering the public key across the facility?

catcatcatcatcatcatcatcatcatca
Автор

the AWS teasing at the beginning had me feeling the square hole trauma all over again

amyisreallybored
Автор

I'm in love with your visualizations, they're eye candy

MHX
Автор

Everytime a coworker suggests building our own date library

worgenzwithmz
Автор

i like the fact that all the high availability/disaster recovery stuff inevitably ends up making the situation into something way worse than if we had just let it fail and tell customers to go take a break

aeghohloechu
Автор

The Zune also had a similar bug. I believe the solution was simply to wait until it was no longer the day in question.

C.I...
Автор

These are so freaking good! I particularly loved the timeline at 10:19, as that is such a Microsoft thing: resolving an issue on February 30th.

Phroggster
Автор

It was fixed the next day??? They might as well have done nothing and it would've fixed itself!

pdlbackup
Автор

Just wait for Y2K38. The Epochalypse will do this to tons of outdated, unmaintained, embedded systems, or just faulty code worldwide.

jonathangawrych
Автор

Some months ago I was manually typing DAX formulas in Powerquerry and one consisted on subtracting one year. It only took me 5 minutes to realise "but what about leap years". How did Microsoft not think about this?
Sadly, by the time a leap year comes, everyone at work will have forgotten my Excel.

Автор

0:50-0:55 The lengths you went to avoid saying AWS is commendable.

justicefool
Автор

Insane that this first happened only 12 years ago.
I had to double check why this didn't occur earlier and made the realization that azure really is only 14 years old. Crazy.

seifenspender
Автор

I’ll tell you why it took 5 hours to fix the bug, they spent 4 hours and 50 minutes in meetings strategizing about how the engineers would identify the bug and the procedure to test any changes that would go out.

nicholascopsey