Denormalizing DB for Justin Bieber #database #sql #webdevelopment

preview_player
Показать описание

Рекомендации по теме
Комментарии
Автор

It's such simple thing but still a franchise level industry wont take it as secure or in options

rayforever
Автор

For people saying displaying incorrect numbers of likes, youtube does view count, the transactions are queued, and race conditions are not a big deal from a users perspective

ianbdb
Автор

Eventual consistency is a fair trade off for performance.

christophcooneyoff
Автор

The fact that they even counted shows they never thought that thing would scale to more than 100k users

lefteriseleftheriades
Автор

This is so untrue. Relational databases a very good at counting rows and will typically not even read the table, instead using btree index to count. What he should say is the hash join of two tables is slow.

This is actually bad advice, UPDATE is a very expensive operation on a key table and will generate logging data to deal with concurrency. Ideally there should be a seperate post_statistics table which can be unlogged for high performance write while trading off durability

Hobson
Автор

It’s slightly a bit more complicated for instagram, where in order to drive engagement, it also shows which one of the people you’re following also likes the post. So other than the count which is global, there’s an aggregate query contextual to the logged in user’s first connections to the post.

wywarren
Автор

Reminds me about a nosql rule,

"Store related data together in a single structure to optimize retrieval, a principle particularly emphasized in NoSQL databases for efficient querying."

Bcs-Mohtisham
Автор

It makes me feel like a less bad programmer knowing this was the solution

aquiace
Автор

Amazing, I don’t know whether it does what you said it does but this is fixing a real-life problem that definitely cost money for the company. The fact that you show your point of fixing it for free, is amazing even if it doesn’t necessarily help instagram. I’m subbing.

emagenation
Автор

Some databases allow you to do something like that automatically, so when a like row gets inserted/removed it automatically updates the count. Or you could just run the count operation every 1-2 minutes and cache the result in the post row :)

jackdavenport
Автор

When I created an ecommerce websites in app. First time I was like. Okay, I'm just gonna create a column and name it. Total likes this way. I'm just gonna show how many likes this product get but later for marketing reasons, I need to know who liked this product so we can send a marketing emails so I had to create now on you table and name it liked by user. ID, for example, so if we want to do marketing email about some products. We just go to that table and find who like that product and send the marketing email. So I suggest you do both solution. Create a separate table and save the ideas of the users who like that post or that product and for speed reasons. Also, add a total likes in the products. List or product post this way. You will have everything for you. It may make the data base bigger. Because now you have a big another table

algeriennesaffaires
Автор

It's more complicated than even that. As others have noted, you don't need (or want) to display a perfectly up-to-date like count, so it can be eventually consistent. The other important factor is despamming. A user clicking "like" and the like counter incrementing is not a single, atomic operation. If the user is identified as a spam bot, then their like should not count for the purpose of promoting the post.

The real workflow is more like:

1. User clicks "like." Store that.
2. Start a batch despam pipeline to figure out whether the like is legitimate.
3. If so, update the like count on the post.

The despam pipeline is going to take some time. In this case, nobody but spammers actually cares about accuracy down to a single like, so lazily updating is just fine.

haxney
Автор

I would prefer a "total likes" table with a foreign key to the post ID. Less bloat on the posts table and achieve the same thing. (ID, postId, Total)

christopherparke
Автор

Thanks for your music, and SQL query optimisation, Freddy!

fullstack_journey
Автор

I did this with my project, a twitter clone web app.You're making me feel like a genius.

deynohmuturi
Автор

It may seem as an unreliable system which might produce inaccurate results, but with proper structuring you won't be able to mess it up. And it saves a lot of resources, too.

JTCF
Автор

one of the most underrated advise for large-scale systems!

morph
Автор

You don't even necessarily need to put the counter in the *posts* table, either. 1, 000, 000 likes would mean 1, 000, 000 UPDATE calls on a table that is being read very often. Having another stats table, keeping the value hot in a cache and then reading from the cache if it's present will be vroom vroom.

hotwaff
Автор

The answer is always "cache it".

Llorx
Автор

counting rows has never been slow. Its basically a core feature of almost all relational DB systems to offer ultra-fast counting. *BUT* : counting several thousands of times per second (!) introduces racing conditions and counting algorithm overhead which can become absolutely deadly for performance because of suboptimal locking or incorrect usage of mutexes, semaphores and the likes. Its not the counting itself killing the performance, its the low scalability of it. There are databases out there which offer ultra-fast counting which is also scalable. In some cases, this is realized via auto-updates of internal count fields, some use dark math wizardry and some are simply that fast by design.

ololhxx