What Is the Fastest Way To Do a Bulk Insert? Let’s Find Out

preview_player
Показать описание

What is the fastest way to do a bulk insert to a SQL database? We're going to find out in this video. We will compare EF Core, raw SQL with Dapper, EF Bulk extensions, and the SqlBulkCopy class. I'll show you the results for inserting 100, 1.000, 10.000, 100.000, and 1.000.000 records.

Fast SQL Bulk Inserts With C# and EF Core

Join my weekly .NET newsletter:

Read my Blog here:

Chapters
0:00 What is a Bulk Insert?
2:28 Implementing the Bulk Insert benchmarks
9:54 Examining the Bulk Insert benchmark results
Рекомендации по теме
Комментарии
Автор

Thank you for the video. I'd like to see a video on bulk upserts/merging data

yjgrwce
Автор

Milan just wanted to say, thanks to you and your C# videos, i managed to land a job. Appreciate ya pal ;)

alexanderst.
Автор

I like using the BulkCopy function, it is amazingly fast for importing large datasets. One thing to note is that the column names specified for the database side are case sensitive. If there's a mismatch in case on column names, the import will fail. You can also eek out even more performance by tweaking the batch size in BulkCopy using `bulk.BatchSize = 10_000;`. Actual performance will vary based on how many columns you're inserting.

anonymoos
Автор

There is another way to speed up bulk insert for some specific cases (for example during periodic ETL process when the entire table or partition should be fully cleaned and recreated from scratch using bulk insert). In this case bulk insert can be slowed down by indexes, constraints, concurrent access, etc . The solution for this would be to do bulk insert into temporary table that does not have any indexes or constraints using ReadUncommited transaction -> after build all needed indexes -> after do partition swap operation with the main table/partition. Another advantage of this approach is that up until the last step - data from original table stays fully available . And partition swap is almost instant and atomic operation

FolkoFess
Автор

Thank you, Milan! This video must be in top.

vasiliylu
Автор

Thanks for doing the research !
Very insightful investigation

-INCGNIT-
Автор

It also looks like there is a library called dapper plus that has bulk insert feature as well. Also a commercial paid library.

pilotboba
Автор

Really interesting! thanks
I think you can send directly the array without the need to convert to anonymous objects in dapper. Anyway it will not change much the benchmark results

giammin
Автор

A few things.

You never adjusted the batch size for EF Core. It is possible to speed up inserts by increasing the batch since. I think by default it is 100.
Also bulk-copy has a way to set the batch size. By default I believe it is set to 0 which means all rows. But, its recommended to use it.

Bulk-copy by default does a non-trasacted insert. So, if there is an issue there is no way to roll it back. There is an option to have it use a transaction, but I assume that will slow it down a bit.

I'm curious if you match the bulkcopy and efcore batch size settings and enable Internal transactions in bulk-copy if the speeds would be closer?

I'm not sure, but did your code create the collection each time? Perhaps to remove the overhead of that you could create the user collection in the constructor?

pilotboba
Автор

Another alternative, if you already have files ready to import, is to use inside of SQL Server the OPENROWSET command:
INSERT INTO Table (
Col1, Col2, ...
)
SELECT
Col1, Col2, ..
FROM OPENROWSET(
BULK 'c:\myfile.txt', FORMATFILE='c:\format.xml'
)
In the XML file, you define the rules in how the file you want to import is formatted (fixed size, comma split, etc...)

rsrodas
Автор

Surprisingly Dapper doesn't perform well. Still I would like to see results when using Dapper with SQL Bulk Insert command.

I personally have used a EFCore.Extensions library, which is a paid one, to do the bulk inserts. My company bought a license for this library and it saved many development days for such things as bulk merge and bulk synchronize operations.
Interesting to compare its performance to sql bulk copy class

antonmartyniuk
Автор

Love this. One quick qn: For the EFCore approaches, would the performance be consistent on Postgres as well as SQL server?

islandparadise
Автор

How dapper can be 5 times slower at 1m records than efcore addall? This doesn't make sense at all.

lolyasuo
Автор

It's possible to check for dapper with executing SP that accept UDT table as parameter?

harshakumar
Автор

Can you try to concatenate the insert query then try sql raw query with ef and see the results

xtazyxxx
Автор

Have you try OpenJson or other json structure with raw sql query?

musaalp
Автор

What do you think about Bulk Update? Can you run Benchmark for us?

belediye_baskani
Автор

Will results be different in case of using postgres?🤔

dymber
Автор

All shown methods are slow compared to „load data from file“. Whenever possible for large data imports use load data from file. It will load gigabyte of data within seconds. One of the best approaches in my experience is to create a temporary table for the datasource and do the load data file command. Then perform the inserts to the entity tables on the database server.
Only issue / drawback can be the network connection when loading large datasets.

ExtremeTeddy
Автор

So DataTable is EF core without paid lib?

EzequielRegaldo