The effect of Random UUID on database performance

preview_player
Показать описание
In this video I whiteboard how UUIDs hurt write (and read performance) when used on secondary and primary indexes. UUIDv4 are the most popular but they are truly random. Compared to snowflakes, ULID or UUIDv7 or even UUIDv1.

0:00 UUIDv4
2:30 B+Tree Indexes and UUID
5:30 Random UUIDv4 Insert Workload
12:40 Ordered Insert Workload (UUID7/ULID, Sequence)
14:00 Shared buffer pool flushes
15:00 Shopify ULID use case
17:00 URL shortner UUIDs?

Discovering Backend Bottlenecks: Unlocking Peak Performance

Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)

Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)

Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)

Follow me on Medium

Introduction to NGINX (link redirects to udemy with coupon)

Python on the Backend (link redirects to udemy with coupon)

Become a Member on YouTube

Buy me a coffee if you liked this

Arabic Software Engineering Channel

🔥 Members Only Content


🏭 Backend Engineering Videos in Order

💾 Database Engineering Videos

🎙️Listen to the Backend Engineering Podcast

Gears and tools used on the Channel (affiliates)

🖼️ Slides and Thumbnail Design
Canva


Stay Awesome,
Hussein
Рекомендации по теме
Комментарии
Автор

The whole time I was wonder why not just insert on a new index rather than pointing to a UUID, that way it's always ordered. But then you said MySQL defaults to this at the end of the video. That's critical for the 'why' of this video

AffyisAffy
Автор

Actually, I was thinking about that question a couple of days ago
Thank you :)

ahmedalaaeldin
Автор

Yes when you age them out, they might leave gaps in the index tree but also reuse the gaps regularly. They might not “nowhere near each other” but always in between. Sp with larger pages the splits are less likely. (Of course IOT is to a good usecase)

berndeckenfels
Автор

Ive used UUID generated outside the database and stored as primary key varchar within the db. Its worked a lot better for over 100 million devices that I was working with. Primary goal was to use it as hashtables when use other systems like redis/dynamo etc. For lookups, with a bit off magic in UUID generation which is not being generated by DB, you can apply Bloom filters as well. I've been out of engineering field for a decade+ but these sort of videos are always much needed to discuss the fundamentals of how things work.

RealEvangelizer
Автор

This one cracked me up 09:59 😂

Great work as usual 👏
Please consider making a video on Pinecone and vector databases 🙏

gaml
Автор

Awesome! Thanks. Dialectic of randomness and order. Very beautiful.

vasiliynet
Автор

Never thought, Hussein will come with a pun in the video "that's what she said"

tempaccount
Автор

The silent joke at 10:00 That's what she said. Haha, classic.

AbhishekSingh-pudg
Автор

MySQL 8 uses UUID v1 (kind of compatible with UUID v2).

By using the UUID_TO_BIN()/BIN_TO_UUID() function with the optional second argument 'swap_flag' it will reformat the UUID before converting it to a BINARY(16). This will make them sequential (since UUID v1/v2 is based on the timestamp).

ddanielsandberg
Автор

10:00 would have never expected that joke from you 😂

RZhuAmpere
Автор

me usually used index "order by timestamp", thats why UUID (in this case, just for unique identity) not a problem.

Maman-Setrum
Автор

lowkey "That's what she said" reference at 9:57 😂

anuragbhagsain
Автор

Please do a video on vector databases.

mikestaub
Автор

The worst is using UUID with SQL Server; since the PK is Clustered; performance becomes awful very quickly.
I use Postgresql and never had performance issues with UUIDs.

OzoneGrif
Автор

What about distributed NO SQL DBs (document, key-value, etc) where the recommendation is to use partition keys with high cardinality and avoid sequential values because that will create “hot” partitions for inserts at least? Maybe its a different use case but it would be an interesting topic too

alegon
Автор

Will this concept be applied to columns that are Varchar and indexed? Because strings are also random like username, email, url slugs, etc.. If so, what is the workaround for storing string values as indexed?

usamaabubakar
Автор

I just randomly stumbled upon your video, and I've always had this itch about UUID performance. Your video really boosted my confidence in what I was thinking. Thanks a lot!

Oh, and quick question: Do databases like MariaDB, MySQL, or PostgreSQL automatically play nice with ULIDs? And how do they know to sort 'em out for indexing?

therealtuyen
Автор

Pro tip: watch at 1.5x speed. You're welcome

redpillsatori
Автор

His videos are so interesting but so slow. I can easily watch it at 2x speed.

shmmh
Автор

If your IDs are public and you make them sequential, be aware that third parties will know the number of entities created over a time period.

saggitt