I loaded 100,000,000 rows into MySQL (fast)

preview_player
Показать описание
Watch me wrangle over 100,000,000 rows of data and load it into MySQL.

💬 Follow PlanetScale on social media
Рекомендации по теме
Комментарии
Автор

you are doing it right PlanetScale. Promoting your brand on the background while you give real value to people watching this videos. Congrats.
Even letting different people is well done (I got used to Aaron though :D)

tcarreira
Автор

Didn't even realise that this is a company account. Loved that video

bozgi
Автор

PlanetScale is LITERALLY the only company i'm subbed to on youtube. your content is THAT good

dudaseifert
Автор

Using the "teach a man how to fish" rule in product marketing is one hell of a genius idea. What a win-win!

k_gold
Автор

Update: I also tried this on Mac Mini M1. This time instead of using all those 17 columns, I did as you did and only imported 6 of them. For postgreSQL, it took 4 m 39 s 810 ms this time :) Almost half of your MySQL timing (and again this is in docker container).
I knew that postgreSQL should be better. I fell in love with postgreSQL once more. Thanks.
Ah, BTW, maybe I should be considered cheating, I use the open source postgreSQL extension Citus (but single node in a docker container - it would be insane with multiple nodes).
(I guess planet scale is also using the Go project Vitess and should be fast on multiple nodes)

ahmettek
Автор

Aaron, you look different!

jk, always love to see new PlaneScale content! ❤

codewithantonio
Автор

Love that you kept errors/issues in the video! Shows that even professionals makes simple mistakes :)

HoverCatzMeow
Автор

as a frequent user of that same lichess database I would really appreciate that c-program you wrote :)

miguel-espinoza
Автор

Ben! My past lecturer at the UofA, he is an amazing teacher so I’m glad I can still learn from him 🎉

robbyface
Автор

09:23 - Am I the only one who noticed that you got error at row one, not even importing any data :)

Anyway, awesome video :) I always get excited when I see new one uploaded :)

DoubleM
Автор

Great video, I didn't know about the "load data local infile" sql command 👍
I would have loved a comparison with INSERT into that inserts multiples lines and once instead of one insert per games, and see if it's faster or slower than LOAD DATA LOCAL INFILE

Arkounay
Автор

100k inserts in 1.7s. Yep, your app with 5 active users and 1000 inserts per day definitely needs Cassandra.

IvanRandomDude
Автор

Please use the torrent next time. That'll save them bandwidth and probably also be faster.

thejonte
Автор

Would really love a playlist dedicated to performance tips

sounakmondal
Автор

Great video, as always. Thank you. I got a question: why did you use ec2 instance to load data to Planetscale? Couldn't you do it directly from your local Mac?

alexrusin
Автор

I would be curious to see the performance difference between multiple independent INSERT statements and one INSERT statement with many values. In my experience, it is much, much faster, though I have never tried importing a dataset quite this large.

MrPudgyChicken
Автор

I have used SQL Loader in one of the pipelines I developed for the company I work for. That process time was reduced from 30 minutes to 2 minutes.

atharqadri
Автор

I had the same problem but with a 300 hundred million rows. The real pain was hidden characters in the data and comma at the wrong places. So too import that data into the database did take over 6 hours. So i had to run the script over night. Too optimize it we did compare the changes with the first import so the second and further imports did go in 1-5 minutes.

UTUBMRBUZZ
Автор

8:42 thats totally understandable why it takes for ever, each ';' is a commit and this takes processing power. Best would be a bulk load of the data. But the direct loading is great too

Richard_GIS
Автор

Very good. My brutal feedback is Aaron sometimes has too many opinions sprinkled into his videos where this fella has almost none. A middle ground would be like wow chefs kiss.

Tjkrusinski