The Lesson About GUID IDs I Learned the Hard Way

preview_player
Показать описание

I will give you the database system ingredients, the most ordinary ones, but if you combine them lightly, there will be a nasty performance bug in your database. A puzzle? Yes, it's a puzzle.
Hear the ingredients first. GUID IDs are common both in relational and NoSQL databases. One can generate them in a distributed system and never produce a collision, which is nice. SQL Server supports GUIDs natively. Does this sound familiar?
Add one more detail, and your perspective might change: SQL Server creates a clustered index on the key column by default.
Now, stop for a moment and think. An application generates GUID values stored as new row keys in the SQL Server database table with the clustered index. The stage is set for the disaster that will only become apparent when the table grows large. And so, the correcting action will be equally costly.
This video will display the dangers of using GUID IDs with default SQL Server behavior and the order of steps to correct the issue once it has been added to the solution. You will also learn how to define your entities and never cause this problem when persisting them.

00:00 Intro
01:08 The Wrong Clustered Index
06:15 Turning to Nonclustered Index
12:15 The Correct New Entity
14:48 Conclusion

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
👨 About Me 👨
Hi, I’m Zoran, I have more than 20 years of experience as a software developer, architect, team lead, and more. I have been programming in C# since its inception in the early 2000s. Since 2017 I have started publishing professional video courses at Pluralsight and Udemy and by this point, there are over 100 hours of the highest-quality videos you can watch on those platforms. On my YouTube channel, you can find shorter video forms focused on clarifying practical issues in coding, design, and architecture of .NET applications.❤️
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⚡️RIGHT NOTICE:
The Copyright Laws of the United States recognize a “fair use” of copyrighted content. Section 107 of the U.S. Copyright Act states: “Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phono records or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright." This video and our youtube channel, in general, may contain certain copyrighted works that were not specifically authorized to be used by the copyright holder(s), but which we believe in good faith are protected by federal law and the Fair use doctrine for one or more of the reasons noted above.

#csharp #dotnet #sqlserver
Рекомендации по теме
Комментарии
Автор

This channel is proof that sometimes the most valuable advice is not on some big channel with millions of subscribers, but on a small channel like this. I cannot express how grateful I am for all your content and advice. You are extremely valuable to the programing community. Thank you so much.

FBarbarian
Автор

Thanks Zoran. A note though that there a number of advantages to a clustered index. Some watching this video might be led to believe that its about insertion performance but there is more to it. In fact I would suggest its about read performance. Many large tables represent transactions (e.g. an order), over time the older transactions are rarely accessed, its the active current ones that accessed concurrently. Having these records physically "clustered" greatly increases the chances that they are already in memory when accessed. It also reduces the read effort a little because the leaf index pages *are* the record pages.

So whether to use a clustered index or not is decision to made carefully and not just on the basis that a developer may prefer to allocate the ID themselves rather than let the DB to it. That said there are approaches that give the best of both worlds, that is, keep the the index clustered and generate keys in the application that highly likely to be sequential.

codingbloke
Автор

Protip for people who want to drop primary keys:
If you make a change to it in SSMS' Table designer, you can have it generate a script, and it will output all the corresponding drop/create statements for you.
I use this a lot when I clean up databases, as it's often faster to reverify foreign keys (minutes) than to leave them active during mass data operations (days).
Especially if you have foreign keys that aren't a primary predicate in your indexes, which is often the case.

billybob
Автор

Thank you Zoran! Nice video. I am always trying to avoid usage of this GUIDs and use ints but sometimes I’ll have to use this.

torrvic
Автор

Thank you for making this topic accessible! The explanation on the performance pitfalls of using GUIDs with clustered indexes was particularly insightful.

OscarAgreda
Автор

Thank you very much for this video, Zoran! I was not aware about the nature of clustered indices. The explanation helped me to revise the DB configuration on a current project I am working on. Very helpful!

arkord
Автор

Thank you for sharing, Zoran.
As you're aware, removing the cluster primary key entails additional overhead for each key retrieval. While this action resolves index fragmentation, it introduces an I/O penalty with each key access.
An alternative solution I propose is the adoption of ULID or UUIDv7 (with a personal preference for the former). These options ensure the monotonicity of generated GUIDs. This quality not only renders clustered primary keys feasible and time-sequential processing effective, but also enriches the ID with additional time-information that could prove beneficial in some scenarios.
You would saved a lot of migrations ;)

GiovanniCostagliola
Автор

The brilliance of this video cannot be overstated. Well done.

esra_erimez
Автор

I cannot believe how much I learned in 15 minutes. This is one of your best.

jordanfarr
Автор

I just want to emphasize how professional and helpful this video was. Please keep it up and thank you. 👏

zhenglaowang
Автор

A lot of data to move when inserting rows in the middle of the clustered index -> this is just a page split, and only then when a page is full. Not super-ideal, but there is an upper-bound to the cost.

geofftnz
Автор

Your voice and manner of talking are amazing. Thanks for valuable advice

mangodude-nqsu
Автор

Really useful as I created GUID PK clustered as SQL suggested by default wrongly. Thanks a lot.

AlexUkrop
Автор

I am entranced by your voice, @zoran-horvat! It was like you were telling me a story. I could listen to you all day long.

Also, great informational video :)

TimSchmidt-ku
Автор

Clear and concise explanation; best video I've seen in quite a while 💯

nessitro
Автор

I love these videos. While I don't code in c# I do find them remarkably informative and useful.

wsollers
Автор

TLDR; if you can't ensure ascending GUIDs order, don't use clustered index, you're welcome.

kocot.
Автор

Thank you so much...I encountered this exact problem....when the table got big...

admindravid
Автор

Fantastic! Your voice makes it sound like fairty tale)

MixturaLife
Автор

Could the original setup work, but with Ulid instead of Guid? So we can keep the clustered index?

zbaktube