Best Practices Working with Billion-row Tables in Databases

Показать описание

Chapters
Intro 0:00
1. Brute Force Distributed Processing 2:30
2. Working with a Subset of table 3:35
2.1 Indexing 3:55
2.2 Partitioning 5:30
2.3 Sharding 7:30
3. Avoid it all together (reshuffle the whole design) 9:10
Summary 11:30

🎙️Listen to the Backend Engineering Podcast

🏭 Backend Engineering Videos

💾 Database Engineering Videos

🏰 Load Balancing and Proxies Videos

🏛️ Software Archtiecture Videos

📩 Messaging Systems

Become a Member

Support me on PayPal

Stay Awesome,
Hussein

Рекомендации по теме

Комментарии

The only thing about this channel that makes me feel awful is that i didnt discovered it earlier. The topics discussed here are not really common, but extremely important in my opinion. I havent found anything that gets near this channel, some topics are so high level that I havent heard of them at all. Makes me feel like I know nothing, which is the best feeling ever. There is no much to learn! Thank you so much!

elultimopujilense

This idea of transitioning the delay to the writer AND then using queues for writes is true architecting foresight. I love it.

zacharythatcher

I'm a junior Software Engineer and working my way through to specialize with Backend. I always learn new concepts from your videos. This one in particular hits home for me because I've worked on couple of projects were I had to make the design choice of the database. I found it quite difficult to make the right choices because I always end up building a search endpoint with full text search and other search parameters.

To put it into context, the databases I had problems with were filled with recipes. I had multiple tables with 100k+ entries filled only with IDs (the problem you mentioned). After two projects I think I switched to the latest idea you mentioned with the list/json column and that one worked the best for me. Because not only it avoids searching through a big table but also saves me an extra query to another table.

This is kind of irrelevant the this video but when implementing full text search I think it's better to go with postgresql rather than mysql since it supports gin and gist indexes and fuzzy searching that can help build a nice, affordable and quick solution to meduim sized databases.

Keep doing the nice work.

achraf

Man your content is GOLD. I come from a front-end stack and was underestimating the work with databases. But your content has helped in understanding the pitfalls of backend engineering.

andreivilla

I’m front end developer acquired backend skills, all because of good content from this channel. Amazing content and energy, thank you 😊

PiyushChauhan

You help not only software engineers with your channel, but also data analysts like me that works a lot with data engineering. Thanks! Greetings from Brazil.

joaopedromoreiradeoliveira

Yet another great video! Truly educational, even for someone who have been in the game for over 15 years. 👏

t

Couldn't have explained it simpler! Big ups dude!!

OmarBravotube

I really like listening to you while doing other stuff, like driving eating or walking. Always being productive and learning new things "sidely" by your videos. Habibi ❤

Bilo_

generally, it's helpful to think in terms of read and write paths of your data. On the read side, on top of partitioning you can add bloom filters to quicky test where a value exists or not to reduce searching the B-Tree or other persistent data structures

nilanjansarkar

Good video. The last trick in the video is called denormalization. Also as soon as you introduce sharding you need also add replication because the probability of failures increases.

ami

I had to deal with tables with a few billion records per month, and MySql merge engine (merge tables) let us slice and dice them any way we needed. You can even have more specific indexes in the real tables themselves, as long as each merge index exists in all tables.

The downside of Merge tables is that it multiplies the open file handles on the system, which can be tricky for a machine doing public networking, but with latest kernels, you can get kind of crazy.

High memory makes a huge difference, of course.

videosforthegoodlife

I've got many solution on database side from your video. Thanks for your support.

rajendiranvenkat

2 extra ideas:
1. table archiving: most large tables are caused by timeseries records -- just archive the old records in separate tables and keep the live table small
2. use modern databases that are more scalable than traditional single-host databases: cockroachdb, spanner, aurora, tidb, fauna, etc.

aXUTLO

Thanks for the video . I did a first time YouTube 'applaud' feature with a small token of 100 Rupees . Hope you recieved it . Keep going 🙏🙏🙏

dinakaranonline

Woah the json method literally blew my mind.

Sarwaan

Fabulous content on your channel. Subscribed. Thanks !! Keep up the good work :)

rashmidewangan

I believe the last concept is actually called denormalization. Another option could be considering NoSQL.
By the way, you are awesome

alichoobdar

Maybe I didn't understand the second last section about eliminating the need to update both ends of a connection, but your solution will crumble when person A who is following person B closes their account since the information on who follows person B is only in the person's B's records. So when person A exits the medium, we won't know who to update.

arianseyedi

I love you man. You are so crystal clear. you are legend.

insearchof

Best Practices Working with Billion-row Tables in Databases

Best Practices Working with Billion-row Tables in Databases

Querying 100 Billion Rows using SQL, 7 TB in a single table

SQL indexing best practices | How to make your database FASTER!

5 Secrets for making PostgreSQL run BLAZING FAST. How to improve database performance.

From 2.5 million row reads to 1 (optimizing my database performance)

Secret To Optimizing SQL Queries - Understand The SQL Execution Order

Update a Table with Millions of Rows in SQL (Fast)

SQL Server 2019 querying 1 trillion rows in 100 seconds

Optimizing MySQL Very Fast Loading Billion rows from Client to Server

Processing 1 Billion Rows Per Second

Try limiting rows when creating reporting for big data in Power BI

SQL Query Optimization - Tips for More Efficient Queries

10 Million Rows of data Analyzed using Excel's Data Model

STOP doing your SQUATS like this!

Very slow query performance in aws postgresql for a table with 4 billion rows

Best Practices in Designing Tables

Superpowering Tableau for Trillions of Rows

how to load one billion rows into an sql database

CIA Spy EXPLAINS Mossad’s Ruthless Tactics 🫣 | #shorts

The Strongest Muscle In Your Body 🤨 (not what you think)

How Much Does 13,000,000 YouTube Views Pay You? #shorts

5 million + random rows in less than 100 seconds using SQL

Snowflake Data Cloud 288 Billion Rows (10TB) -APOS Live Data Gateway Performance with Large DataSets

HOW TO DO A 1 ARM PUSHUP?