Denormalizing DB for Justin Bieber #database #sql #webdevelopment

Показать описание

Рекомендации по теме

Комментарии

It's such simple thing but still a franchise level industry wont take it as secure or in options

rayforever

For people saying displaying incorrect numbers of likes, youtube does view count, the transactions are queued, and race conditions are not a big deal from a users perspective

ianbdb

Eventual consistency is a fair trade off for performance.

christophcooneyoff

The fact that they even counted shows they never thought that thing would scale to more than 100k users

lefteriseleftheriades

This is so untrue. Relational databases a very good at counting rows and will typically not even read the table, instead using btree index to count. What he should say is the hash join of two tables is slow.

This is actually bad advice, UPDATE is a very expensive operation on a key table and will generate logging data to deal with concurrency. Ideally there should be a seperate post_statistics table which can be unlogged for high performance write while trading off durability

Hobson

It’s slightly a bit more complicated for instagram, where in order to drive engagement, it also shows which one of the people you’re following also likes the post. So other than the count which is global, there’s an aggregate query contextual to the logged in user’s first connections to the post.

wywarren

Reminds me about a nosql rule,

"Store related data together in a single structure to optimize retrieval, a principle particularly emphasized in NoSQL databases for efficient querying."

Bcs-Mohtisham

It makes me feel like a less bad programmer knowing this was the solution

aquiace

Amazing, I don’t know whether it does what you said it does but this is fixing a real-life problem that definitely cost money for the company. The fact that you show your point of fixing it for free, is amazing even if it doesn’t necessarily help instagram. I’m subbing.

emagenation

Some databases allow you to do something like that automatically, so when a like row gets inserted/removed it automatically updates the count. Or you could just run the count operation every 1-2 minutes and cache the result in the post row :)

jackdavenport

When I created an ecommerce websites in app. First time I was like. Okay, I'm just gonna create a column and name it. Total likes this way. I'm just gonna show how many likes this product get but later for marketing reasons, I need to know who liked this product so we can send a marketing emails so I had to create now on you table and name it liked by user. ID, for example, so if we want to do marketing email about some products. We just go to that table and find who like that product and send the marketing email. So I suggest you do both solution. Create a separate table and save the ideas of the users who like that post or that product and for speed reasons. Also, add a total likes in the products. List or product post this way. You will have everything for you. It may make the data base bigger. Because now you have a big another table

algeriennesaffaires

It's more complicated than even that. As others have noted, you don't need (or want) to display a perfectly up-to-date like count, so it can be eventually consistent. The other important factor is despamming. A user clicking "like" and the like counter incrementing is not a single, atomic operation. If the user is identified as a spam bot, then their like should not count for the purpose of promoting the post.

The real workflow is more like:

1. User clicks "like." Store that.
2. Start a batch despam pipeline to figure out whether the like is legitimate.
3. If so, update the like count on the post.

The despam pipeline is going to take some time. In this case, nobody but spammers actually cares about accuracy down to a single like, so lazily updating is just fine.

haxney

I would prefer a "total likes" table with a foreign key to the post ID. Less bloat on the posts table and achieve the same thing. (ID, postId, Total)

christopherparke

Thanks for your music, and SQL query optimisation, Freddy!

fullstack_journey

I did this with my project, a twitter clone web app.You're making me feel like a genius.

deynohmuturi

It may seem as an unreliable system which might produce inaccurate results, but with proper structuring you won't be able to mess it up. And it saves a lot of resources, too.

JTCF

one of the most underrated advise for large-scale systems!

morph

You don't even necessarily need to put the counter in the *posts* table, either. 1, 000, 000 likes would mean 1, 000, 000 UPDATE calls on a table that is being read very often. Having another stats table, keeping the value hot in a cache and then reading from the cache if it's present will be vroom vroom.

hotwaff

The answer is always "cache it".

Llorx

counting rows has never been slow. Its basically a core feature of almost all relational DB systems to offer ultra-fast counting. *BUT* : counting several thousands of times per second (!) introduces racing conditions and counting algorithm overhead which can become absolutely deadly for performance because of suboptimal locking or incorrect usage of mutexes, semaphores and the likes. Its not the counting itself killing the performance, its the low scalability of it. There are databases out there which offer ultra-fast counting which is also scalable. In some cases, this is realized via auto-updates of internal count fields, some use dark math wizardry and some are simply that fast by design.

ololhxx

Denormalizing DB for Justin Bieber #database #sql #webdevelopment

Denormalizing DB for Justin Bieber #database #sql #webdevelopment

Instagram issue || How counting rows in dbms is slow #sql #mysql #developer #interview #code #coding

How Instagram saved their platform

Tech Talk: What to do if Justin Bieber starts using your app

Database Sharding Explained | Database Partitioning | Database Scaling Tutorial

Database scalability techniques

Riak & Dynamo, Five Years Later Presented • Andy Gross • GOTO 2013

MongoDB Schema Design Best Practices

DS220.24 Data Model Anti-Patterns | Data Modeling with Apache Cassandra

Data Architecture Day - Joe Karlsson - MongoDB Schema Design Best Practices

System Design Interview | General Knowledge Part 2

From SQL to NoSQL: A Gentle Introduction For Developers | Confoo CA | 02/28/2020

Understanding NoSQL Document Database Schema Design Jun 16 2022

Database Normalization ကို Model ဖြင့်တည်ဆောက်ခြင်း

Monolith to reactive - it's all about architecture

Instagram: Apache Cassandra at Instagram 2014

Normalization Form

Maximizing Apache Druid performance: Beyond the basics

Apache Druid: Why Speed In Analytics Matters - Rommel Garcia

Transforming Analytics with Software Engineering Techniques

Apache Cassandra - Tutorial 5 - Query First Approach

System Design Building Blocks - Understanding the Basics of Web Application Architecture

The Evolution and Advantages of MongoDB in Modern Data Management

MetaMarkets - Introduction to Druid by Fangjin Yang