Don’t Use the Wrong LINQ Methods

preview_player
Показать описание


Hello, everybody, I'm Nick, and in this video, I will show you the difference between two LINQ-associated methods that look exactly the same but perform very differently.

Don't forget to comment, like and subscribe :)

Social Media:

#csharp #dotnet
Рекомендации по теме
Комментарии
Автор

Small correction at 8:03. The 40 bytes are not because the enumerator was allocated, in this case the enumerator a List<T> gives back is a struct. The 40 bytes are because the struct needs to be boxed into an IEnumerator interface because foreach operated on an IEnumerable. A foreach on List<T> doesn't allocate and is faster than a foreach on (IEnumerable<T>)List<T> which allocates, as a List<T> does an explicit implementation of GetEnumerator (to hide it, because it returns an interface) and adds another GetEnumerator method that returns the struct enumerator directly - which foreach will use - which avoids boxing to an interface. It's also faster because calls are direct (static binding b/c struct methods) instead of virtual (dynamic binding through a vtable because of the interface).

mrahhal
Автор

You can try this List extension method approach as a simple fallback too since it's easier to incorporate into your code rather than changing the underlying List implementation:

namespace System.Collections.Generic
{
public static class ListExtensions
{
public static bool All<T>(this List<T> list, Predicate<T> predicate)
{
return list.TrueForAll(predicate);
}
}
}

This can be extended to handle functions, such as Any => Exists, FirstOrDefault => Find and so on.

RadusGoticus
Автор

The enumerator for the List<T> will check the _version field of the list on the MoveNext calls. That field is updated every time the list is modified so if the list is modified while you are iterating over it, you will get an InvalidOperationException exception informing you that the collection was modified. TrueForAll does not check the _version field and just directly indexes into the list. If All used TrueForAll under the covers, then that would be a behavior change. Also, is Func<T, bool> equivalent to Predicate<T>? I wasn't aware that the latter existed until watching this video.

ecpcorran
Автор

There are already many LINQ methods that have different behaviors depending on the real type of the source enumerable. Why don't they change the All method so that it uses TrueForAll when source is a List<T> or a T[] ?

Krimog
Автор

I'm curious if this is generally true for other Linq methods with native counterparts.

For example Exists vs Contains or Any / Where vs FindAll.

Would the only difference be the generation of the Enumerator? I know some Linq methods do some magic underneath the hood but I always assume the native implementation to better.

lordshoe
Автор

TrueForAll is not LINQ method. This is List method. Just like Add or Remove. Don't mix them.

mightybobka
Автор

I would defer to the principle of least surprise: if the performance is not a (measured!) issue, use the generic .All(), otherwise do whatever weird stuff you gotta do for performance.

I wish collections had specialized implementations of certain LINQ methods. E.g. having to use the trifecta of .Length,
.Count and .Count() depending on what kind of collection you're working with is annoying, and it feels like specialized implementations of .Count() could exist for Lists/Arrays/etc.

I bet someone is going to do some nasty things about this with interceptors at some point...

onetoomany
Автор

Maybe it could be possible to write a source generator that intercepts the LinQ method call, and forces to use the most optimized implementation according to the the type

kikinobi
Автор

Do you really need your own `MyList` class? Wouldn't the `foreach` on the standard `List<T>` have worked the same?

The problem with the `foreach` in `Enumerable.All()` is that the static type of the sequence is `IEnumerable<T>`. That causes boxing of the enumerator and virtual calls for `MoveNext()` and `Current`.

If the static type were `List<T>`, the `foreach` would have been as performant as the `for`.

vyrp
Автор

Thinking of IEnumerable<T> as a linked list - counting the elements is in itself an enumeration of the list. The implementation of All() therefore could not depend on the count of elements like TrueForAll does and can't avoid the enumerator.
Using TrueForAll on an IEnumerable<T> would require a ToList() or ToArray() call first, which also uses the enumerator unless the thing can be pattern matched to ICollection, in which case it does a memcpy.

Even if the thing is ICollection and then TrueForAll() could be used, this is actually not the same behavior as All() because an Enumerator does more on each MoveNext than what TrueForAll is doing in its loop body, which another commenter has already pointed out.

NateHK
Автор

So the collections need an optimal fold implementation with early exit, right? Having that, one can express First(), Single(), Any(), Any(Func<, >), All(Func<, >), Select(Func<, >), Aggregate(...), Skip(int), Where(Func<, bool>), Take(int) and some others without need for IEnumerable<>. Something like `IFoldable<T> { TAgg Fold<TAgg>(TAgg seed, Func<TAgg, T, Option<TAgg>> f) }`. Now add a lazy `IFoldable<T> Reverse()` that would exploit indexing and you get Last(), SkipLast(int), TakeLast(int)

tkjyxrb
Автор

The foreach issue strikes again.
Wonder if C# could just have 2 paths when running any foreach. If its a simple List<T> :run the basic - no enumerator for loops.

Then LINQ doesnt have to be responsible for checking for List/NonList switching.

VeNoM
Автор

So basically TrueForAll() >= All() ?

I don't understand why MS did it this way. Lets say if All() exists before TrueForAll() - why didn't MS just replaced the Implementation of All() with the more performant one? And if TrueForAll() did exist before All(), why did they even add it in the first place?

neralem
Автор

Nick, you provided the wrong explanation: allocations are due to interface as a parameter type.

Just make two functions with foreach inside: one with list parameter, and the other with interface like IReadOnlyCollection; pass list into both and behold the allocations in the second method

andreypiskov_legacy
Автор

Someone made LinqAF. It's Linq, but implemented entirely using structs and is nearly allocation-free (hence the AF). Apparently it's a little slower than Linq, but its performance is more consistent for cases such as game development because of the immensely reduced allocations.

RealCheesyBread
Автор

Why "x => x > 0" lambda didn't contribute to allocations? Is it some kind of C# compiler optimization?

nterstellar_yt
Автор

I wonder how ConvertAll vs Select, Find vs FirstOrDefault, FindAll vs Where, Exists vs Any perform, in both List and Array types.
I expected them to be optimised like it's done for Count(), but in this video we see it's not a rule.

kyjiv
Автор

They need to have some improvements left for net9 😅

ryan-heath
Автор

As a Unity developer, I always create extensions that copy linq methods optimised for array and list. That's my workaround, I'm just piggybacking that heap of extensions or make them a new. I really use a lot select and to array, so i make SelectArray method for status, lists and enumerable and roll with it.

zORg_alex
Автор

TLDR; list implementations are better than generic IEnumerable methods, and btw one of the 2 methods discussed is not even LINQ o_O, so the title isn't very honest. Now, if you're interested why (foreach vs for), feel free to skip to ~ 5:00 (although reading comments is probably a better idea as the explanation from the video is arguable)

kocot.