Speed up your Rust code with Rayon

Показать описание

Today we are learning how to easily parallelize your sequential Rust code with Rayon.

Chapters:
0:00 Intro
0:31 How to use Rayon
4:03 How Rayon works
6:04 Customization
6:16 Outro

Рекомендации по теме

Комментарии

I really like how your first example showed a situation where using multiple threads is actually slower. A lot of people/explanations talk about big Oh notation and performance as '(better big Oh || more threads) == more better'. But what often doesn't get mentioned is that it is highly dependent on the context and the amount of data you are working with.

Optimization without benchmarking isn't optimization.

mathijsfrank

I definitely appreciate you showing a case where it isn't actually faster. Really helps highlight the importance of bench-marking if you intend to actually optimize.

timwhite

Polars could be a good fit for the next video (Lightning-fast DataFrame library). You could even benchmark the same way you did in this video.

williamdroz

great video!

At 4:49 I had to check my glasses, then my connection, then I realized what you did :)

codeshowbr

Thankfully, I already used it for my weekend ray tracer. It's pretty easy and scales perfectly for longer tasks.

LtdJorge

The content was great. It was helpful to see that the parallel version can be slower in some situations. The stock footage was a bit distracting, especially the blurry bit (maybe that was an in-joke I didn’t get).

Perspectologist

It reminds me of a Java Stream's API parallelStream method

ИмяФамилия-хве

Mate, I fkn love your videos. Every time I am stuck, you have a video there to save the day

DavidAlsh

Curious what it does under the hood. If you wrote the parallelism yourself, for two cores, you would split the 200, 000 items into two arrays and assign 1 core to each array, which ought to have minimal overhead. But if it splits the 200, 000 items into 200, 000 tasks which have to be stolen, that is a lot of overhead per small item.

If you rewrote your iteration to be over chunks, and did counting over chunks, adding together at the end, would rayon perform better?

You could divide task into 2, 4, 8, 16, 32, 64, 128 chunks and see how much performance degrades. But I bet even 128 chunks, which would spread out well over most CPUs, would have a 1000x better ratio of overhead to benefit than 200, 000 individual tasks.

AlwinMao

Rayon is really amazing! It's actually incredibly performant.
I have even been comparing with loop parallelization in Fortran and C which can be done by a compiler such as GCC, with a lot less guarantees. I tested it to perform faster, even though especially modern Fortran has some interesting features as well, such as a `do concurrent` loop and also so called array programming features.
And apart from that, loop parallelization absolutely only works with the proper compiler flags, otherwise it does not.
I like the expressive functional programming way of Rust a lot more, where that problem does not exist and Rayon handles it so much smarter, and you can tRust it.
It also reminds me of Parallel LINQ in the C# programming language, which is similar, although obviously it cannot compete with Rust performance at all.
In this whole daylight, I would also like to mention NDArray, which is a really powerful crate for multidimensional array functionality. With things like these, I totally see a very serious place for Rust in both game development as wel as in scientific parallel computing. Actually amazing!

jongeduard

It would have been nice if you had gone into some of the deeper stuff in your discussion.

Under the covers you are adding complexity to your code.

It would have been interesting to see how Rayon handles locking and inter-thread communication since you are implementing the classic librarian / reader problem.
Also, even if you are not using multiple cores, depending on the task, you can also gain performance if you are doing some parallelism. (e.g a thread is blocked on a wait state so another thread could work while the first thread is waiting. )

michaelsegel

I ran a comparison of similar code with and without rayon. The non-parallelized code ran more than 2X faster. But it did not call collect() so it wasn't a perfect comparison. I wasn't able to adapt the rayon code to run without calling collect() first. I was able to change the non-parallel code so it would call collect and then iterate. This was a more meaningful comparison, and the parallelized code was about 10% faster. The task was to add one billion f64 numbers all equal to 1.0.

fsaldan

Another crate to cover would be Polars or DataFusion. Both are DataFrame libraries based on Apache Arrow. Polars's documentation is a bit sketchy for Rust atm, and DataFusion appears to prefer doing everything asynchronously.

Direkin

Great overview. I heard of rayon and got a good impression back then but this kind of concise insight is much easier on my brain.
I would appreciate such a treatment for Elementum once it's out.

TheLomsor

I could only calculate the performance benefits of rayon and xargs with benchmarking. Is there a deterministic way to calculate the performance benefits for the specific task beforehand? In my case, I have large chunked files and have like 30 CPU cores in the computation center. Whenever I use rayon and xargs together, the performance somehow drops. Let's say the task is creating a frequency table where each line is the count of a quantity in these large files.

What would be cool would be how to parallelize code across different CPUs (not threads of the same CPU) on the same machine, e.g. on a HPC cluster. In C you would use MPI for that. How would that work in Rust?

Metagross

On the second benchmark example, I think the error bars should have been called out. It compares 300 +/- 30 to 200 +/- 130 milliseconds.

isabelkaspriskie

Some day, they'll have chat-gpt3/4 integrated to show you how to fix your compiler errors, and/or fix them for you with a little "fix it" button. Then, Rust will truly be easy for all.

jeffg

So, if I understand correctly, in your example code, parallelizing it only made sense when you were processing a vector of at least a certain length. And it would've only made sense with a smaller vector if your filter operation had been more expensive, right? And obviously Rayon can't see how expensive your filter operation is, so Rayon can't make an educated guess about when the vector is long enough to justify parallelizing it.

In that case, wouldn't it make sense if Rayon would offer a method like > 2e6)` ?

EvertvanBrussel

I've been making benches with criterion which works fine but for small tests like this I had absolutely no idea this way of benchmarking existed lmao

oxey_

Speed up your Rust code with Rayon

Speed up your Rust code with Rayon

Mastering Rust in VS Code, Speed Up Your Workflow

Blazing Fast, Minimal Change - Speed up Your Code by Refactoring to Rust

Speeding up Rust Code

How To Make Your Python Packages Really Fast With RUST

programming language, speed compilation #c++ #golang #rust

Rust 101: Quick VS Code Setup in Under 2 Minutes

Use THIS Language to Speed Up Your Python Code

Zoo's Rust Club [Episode 3] - Speeding Up Rust Code

Someone improved my code by 40,832,277,770%

Speed Up Python With Rust

How To SPEED Up Python Code

Rust in 100 Seconds

How to Learn Rust

Turn Python BLAZING FAST with these 6 secrets

Furnace Speed and Output Commands | Rust Console Community Server GUIDE

Awesome PS4 Lifehack YOU NEED!

This Algorithm is 1,606,240% FASTER

Rust at speed — building a fast concurrent database

The HARDEST part about programming 🤦‍♂️ #code #programming #technology #tech #software #developer...

Tadas Barzdžius. Speeding up Python with Rust

Rust Compiles Faster NOW!

How to Learn to Code FAST (Do This or Keep Struggling)

Speed Up Your Pandas Dataframes