How IEnumerable can kill your performance in C#

preview_player
Показать описание


Hello everybody I'm Nick and in this video I will show you how IEnumerable can harm your application's performance. I will explain why it happens, what you can do about it and how to deal with it in future scenarios.

Don't forget to comment, like and subscribe :)

Social Media:

#csharp #dotnet
Рекомендации по теме
Комментарии
Автор

I swear this is one of those things resharper has taught me with it's warnings, I rarely see it now because I know better. Great explanation of multiple enumerations.

TheBreaded
Автор

This enumeration style is called co-routine for those who didn't know. You basically have a function on hold that can give you the next element right when you need it 😄

Actually this is a crazy efficient way to represent e.g. endless streams like indices from 1 to n, e.g. for n=int.MaxValue this is 2^31-1 * 4 byte. Your PC would simply explode if you'd call ToList() on it because it's 8GB of data. But a co-routine like Enumerable.Range() could do that with just 2 int variables and 8 byte.

It really makes a huge difference as you can keep this little chunk of 8 byte in faster cache levels of your CPU and crank on it like crazy. A ToList() too less or too much can make your program run 2 hours instead of 1ms 😅😅😅

marcotroster
Автор

I been programming with C# for about 15 years and there are parts about it that still mystify me. Your example of obtaining a count via an IEnumerable reminded me of how I learned on my own a similar situation with your example. In my case I was loading over 100k records. EF was new to me and I couldn't understand why my app was taking a performance hit until I discovered the difference between IEnumerable and IQueryable. From then on it forced me to take into consideration the overall purpose of the program and how to use IEnumerable properly. You are very well versed in the programming language, more than me after working with C# for so long.

On a side note, back when I was learning programming in 1991 I asked a senior developer of our mainframe why people are sloppy with their code. He told me that it will only get worse because as computers get faster it will compensate for bad coding practices and the end result will be lazy programmers. I came from learning to program on a mainframe environment where every byte counted. We ran accounts payable and payroll for 300 employees. All of it was done on a 72 megabyte hard drive.

rafaelm.
Автор

When I started to think about it more deeply, this system is actually very very good:
If we have some enumerable thing A given for a consumer B, how could B assume that it has enough memory to hold all elements of A?
Ans: It can not, and thus it protects itself with this solution of multiple enumerations: If file read in this video was gigantic (let's assume milions of lines) then multiple enumeration IS desired!
The solution is just to use IReadOnlyList, which has enough space saved prior to the enumeration.

mastermati
Автор

A common recurring problem among programmers is not knowing how the code they're using works. At the very least, they should understand what the API commits to doing. Deferred enumeration of IEnumerable is a great feature in C#, but if you're using any API that exposes an IEnumerable object, you should always assume that you need to enumerate at some point, unless your objective is to merely chain subsequent operations to perform on the object.

In fact, if you never actually enumerate the sequence, it will never actually execute, and this is also an easy trap to fall into. So my best advice would be to write your API according to what client code should expect. If you're returning a finite object, rather than returning IEnumerable, you should return IReadOnlyCollection or IReadOnlyList (or any read-only interface). That way, client code knows that enumeration has already been performed. If you return IEnumerable, client code should assume that enumeration will be required, and even the implementation should probably avoid enumerating to a terminal operation.

JetBrains.Annotations also has the [NoEnumeration] attribute that you can assign to an IEnumerable method parameter to indicate that your method isn't performing a terminal operation over the parameter.

marcusmajarra
Автор

Great illustration of how/why this happens. Something I can send to my peers that get confused as to why their code is hitting an API twice when running around with IEnumerable or IQueryable.

asteinerd
Автор

The biggest problem I have with Linq in general rather then IEnumerables is the heap allocation that takes place when evaluating queries with ToList() and the like in memory-sensitive hot paths. In almost every other scenario it's absolutely fine, but it makes my life hell when I have to do rate calculations on 100-500 messages/s.

Would be good to see you cover MemoryPool<T> and ArrayPool<T> at some point, those types have truly saved my bacon!

jamesmussett
Автор

Whoa, for the past one year I was getting sometimes warnings "Possible multiple enumerations" and never knew what does it mean :V Thank you!

Arekadiusz
Автор

The rule is simple - return precise types, and accept abstracted types. If you return List<T>, then your method's return type should be List<T> not IEnumerable<T>. So consumers can exactly now what is the actual type and if they want to, they can limit it to an interface implicitly.

sergiuszzalewski
Автор

IEnumerable - The fast-food restaurant of programming

Max-mxyc
Автор

My personal choice is to return an I…Collection, so that the consumer knows that the “inner” code isn’t deferred. Of course, there are situations where an IEnumerable is better, for instance when implementing repositories. But such repositories are mostly consumed from other application specific services.

mariorobben
Автор

I seem to remember the LINQ documentation explicitly stating that Enumerables are lazy-evaluated. It is a feature, one that all developers should be cognizant of so that they can force one-time evaluation when appropriate.

kenbrady
Автор

it really is quite clean, there is even a warning (at least in visual studio) CA1851: Possible multiple enumerations of IEnumerable collection

it just requires the developer to read the warning and handle it. (or the senior developers elevate this from warning to a compilation error)

nocgod
Автор

The beauty of IEnumerables is lazy/deferred execution.
A trap (per this video's message) if you don't have a grasp of what it is.

Lazy/deferred execution I believe was borrowed from the Functional paradigm.
The idea is that you have a set of logic/algorithm which wont be executed/evaluated
unless with explicit intention.
In C# LINQ, you express the 'intention' by calling operators like
.First()
.ToList()
.Count()
.Any() etc.

Examples of lazy LINQ operators,
.Where()
.Select()
.OrderBy() etc.
These return an IEnumerable of <T>.
Lazy/deferred execution shines when composing/chaining functions and
when you intend to use your functions in between a "pipeline". Hence the above 3 are often used in a query chain/pipe.

Pertaining to collections, lazy evaluation passes only 1 item to each node/operator in the chain/pipe at a time.
But for eager evaluation, the whole collection is evaluated and passed down.
If there were conditions of 'early breaks', the latter won't benefit as the collection has been prematurely evaluated.
E.g. a lazy pipe/chain
products
.Where(p=> p.InStock()) // each product 'in stock', will flow down..
.Where(p=> p.Price < 3.14) // but only 1 at a time and not the full list because 'where' is lazy.
.Select(p=> p.ToShippable()) // Concatenated lazy chains act and behave as one (select is also lazy).
// I often combine multiple individual lazy operators to solve complex problems with very little concern for performance penalty.
// Shifting the order of the operators around is also quite easy as they are somewhat stand alone..

smwnl
Автор

Basically it's very simple: LINQ is a pipeline (and chained yield returning function calls are as well). It's a series of enumerators chained together like a single expression, and will not be running until you run a loop on it, to perform actual work.
A function like Count() is a terminating operation, because it does not return an IEnumerable by itself but a computed, numeric result, meaning it has to run a loop on the the preceding expression.
And a self written foreach loop is basically another terminating operation.
ToList and ToArray are as well, they create a new collection in memory and run a loop to fill it with data.
This means that ToList and ToArray come with the disadvantage of extra memory allocations.
While not using them and repeating loops on the expression come with a time and CPU usage penalty, like basically shown in the video.

jongeduard
Автор

the IEnumerable approach is the only sensible one in some situations, if there are too many items to fit in memory.
I find myself using a 'batchBy(int n)' approach: it turns IEnumerable<T> to IEnumerable<List<T>> so that you can work on a smaller list, but if things are too big, you can take them in byte sized chunks.
It does mean something like 'count' (or other things that require global knowledge) can only be accumulated and discovered at the end of the list.

andytroo
Автор

I rolled my own "CachedEnumerable<T>" which lazy-caches the results of an enumeration - it's a wrapper over IEnumerable<T> which tests the underlying enumerable (e.g. is it IList<T>) and skips the cache for enumerables that are already cached/array-based. Using it gives me the best of both worlds - lazy enumeration and automatic caching.

carldaniel
Автор

Materialization is not good option too. What if you file is 200GB size? Or what if there is pseudo infinite enumerable like network data or reading database cursor? So you can't always cast to list because of memory. So yes, watch you code and do what you understand. yield return is not bad if you know what you are doing.

LordXaosa
Автор

You have no idea how much this helped me today! I was looking at a problem where counting an IEnumerable with zero elements in it resulted in a significant delay and I thought I was going crazy! I had no idea that IEnumerable would be lazily evaluated. Thanks for the help! :)

michaellombardi
Автор

You could also adress how yield return's are dangerous in that the source list can be changed between enumerations, e.g. items removed between the .Count() and the output.

stefanvestergaard