Uber Eats, PayTM and Free Food: What Happened?

preview_player
Показать описание
Anyone could order free food on UberEats for a weekend in 2019. Years later, I can finally share what really happened behind the scenes.

Mentions in the article:

Check out other things I made:

Connect with me on other platforms:

00:00 - Intro
02:10 - What happened?
05:00 - On HTTP status codes
07:10 - The change
08:50 - Failing open vs failing closed
11:05 - How we discovered the bug
12:50 - The fix and learnings
19:15 - What was the damage and who should pay?
Рекомендации по теме
Комментарии
Автор

I was the one that ate a lot during that time. Gained 5 kgs in a week NGL.

MrPsycic
Автор

I think Uber has to own this one. There is no such thing as implicit idempotency. If the PSP does not explicitly promise idempotency, the service should be assumed to be non-idempotent until the issue can be cleared up by contacting the PSP. Uber deals with many PSPs and the Payments team were in a better position to recognize this.

That being said, a non-idempotent PSP clearly has room for improvement with regards to service design. Also, if there was a reconciliation workflow in place, Uber might have caught this earlier, but at the end of the day, Uber chose to integrate with the PSP despite a non-optimal service design and they should take the hit for that decision. :)

henrikschmidt-mller
Автор

I think Uber is primarily at fault here, and the main decision that led to the overall issue here was to fail open. If Uber wants to reap the benefits of failing open, they should also own the costs of failing open as well, especially without monitoring the unknown failures. However, if the PSP had documented the expected results for a failure case, and then changed it out from under Uber's feet so that the failure case was operating differently without telling Uber, that is explicitly a breaking change and Uber and the PSP should own the failure 50:50.

tkiwisi
Автор

I think it's a mistake from the PSP side because the code written by the API consumer was already working since last 4 years and when a contract change happens on the provider, definitely the consumer needs to be informed before rolling it out because in no way the consumer will be aware of a sudden message change. Talking about alerts, the team which wrote the consumer code must have not even thought about having an alert set on responses because maybe before integrating Paytm, the agreement would be having some clauses from the business side one of which would be to communicate contract changes (don't know if this is considered a clause when bringing on external SDKs). And who paid for the food, restaurants would have grabbed UberEats and not Paytm. And whether UberEats asked for the money from Paytm does not seem legitimate since the provider is an external party and has no relation with the customers or the company in anyway except for the SDK.

Kudos to people who enjoyed free food.

Thanks for the video!

IdreesDargahwala
Автор

Thanks for sharing this interesting incident. Please keep posting these kind of "weird" topics 😀. I am your avid follower 🙏

vmnn
Автор

This was so insightful. You mentioned this was the second biggest loss for Uber payments while you were there, will you be sharing the story behind the biggest loss? :p

paras
Автор

Thanks for sharing! great lesson. Did you change the behavior to fail close after that? or did you implement some alerts for the new statuses?

parisssss.e
Автор

Changing the behaviour of an API on a friday.... Would love to see their side of how things. This almost sounds like sabotage...

Anyway, from the point of view of a backendengineer imho the provider is the one who did it wrong. You cant just change observabel behaviour... There is allways at least one customer who implemented against it.

einCAA
Автор

Why did anyone not mention API versioning? PSP should use that and introduce new changes as a new API version. After some time, old endpoints should be disabled, but then they are sure that consumers migrated to new version which is returning different responses etc.

s
Автор

Would have been intresting to see a poll on this.

As a netral party it's difficult to but just one side on blame. You already pointed the points pretty clear.

PSP could :
1. Have a fixed release plan with release notes communicated well in advance.
2. Document response codes/payload well. This is something really important for important services that are integrated with lots of high risk services.

Uber as a consumer could:
1. Maintain fail open only for known errors instead of returing "true" in the last catch statement (have seen that kind of code :p )
2. Monitoring fail open rate could have detected the issue pretty quickly. Like page if rate breaches a certain threshold.

It narrows down to hown the contract with PSP looked like in terms of SLO's

chimp
Автор

I saw the post on Linkedin and my first thought was same, who does 200 OK even for API Failures. After the video it seems most of the PSP does that, will read about this more.
Also would love to hear about other such incidents, especially the one which resulted in the biggest loss.

ashishmittal
Автор

In my opinion, if the PSP contractually guaranteed the endpoint is idempotent, but it wasn't - its the PSPs fault. If they didn't, its Uber's fault for lack of due diligence.

GreeProductions
Автор

The reason why most payment companies send 200, that the we got your message, but the status is just a mess.

aashaytambi
Автор

I had used this trick and ordered 100s of Mango and other shakes. 🤣🤣🤣

saurabhshubham
Автор

I think being defensive and resilient is good, but this does not mean that what the said PSP did was justified. A proper communication should have been there, and if not that, atleast the deployment should have happened in stages as you mentioned. So, I find the PSP at fault here.

rajatexplains
Автор

Did he talk about the biggest lost yet?

zuowang
Автор

If you have a proper microphone laying around it might be worthwhile to dig it up your laptops mic gets saturated (:

Wyvernnnn