Finding and Fixing Slow Code // Ray Tracing series

preview_player
Показать описание


🧭 FOLLOW ME

📚 RESOURCES (in order of complexity)

💾 SOFTWARE you'll need installed to follow this series

CHAPTERS

Welcome to the exciting new Ray Tracing Series! Ray tracing is very common technique for generating photo-realistic digital imagery, which is exactly what we'll be doing in this series. Aside from learning all about ray tracing and the math to goes into it, as well as how to implement it, we'll also be focusing on performance and optimization in C++ to make our renderer as efficient as possible. We'll eventually switch to using the GPU instead of the CPU (using Vulkan) to run our ray tracing algorithms, as this will be much faster that using the CPU. This will also be a great introduction to leveraging the power of the GPU in the software you write. All of the code episode-by-episode will be released, and if you need help check out the raytracing-series channel on my Discord server. I'm really looking forward to this series and I hope you are too! ❤️

This video is sponsored by Brilliant.

#RayTracing
Рекомендации по теме
Комментарии
Автор

Great talk on GDC is 'Noise-Based RNG' that could be applied in this scenario, using a purely functional rng for doing the maths and seeding it by the frame number, index and thread index would give you a 'stateless' prng that is more threading friendly.

This could also cut out some of the random drops in render time for a single frame. Atm it's using Mersenne's Twister which occasionally has to rebuild it's internal state when it runs out of buffer and that is a relatively expensive operation.

JaceMorley
Автор

Had to solve a problem exactly like this in a simulator written in Go. We ended up having a separate random number generator per thread/goroutine (as you described). Very cool

jackevansevo
Автор

I work on some old server software in Java (which at the moment is performing poorly for a variety of reasons), and I literally just had a problem similar to this with a random number generator across multiple threads. This code had existed for years, and as soon as we used a thread local RNG (like in this video) we saw an immediate improvement. Really good video on the debugging process for a problem like this.

auwee
Автор

A non crypto hash is more what you want rather than a PRNG like the twister. They tend to be faster, and have a better distribution, and aren't so dependent on constantly reiterating a state variable. And parallelism is kind of built into the concept instead of being tacked on.

ernststravoblofeld
Автор

In fact, this scary problem is called "false sharing". It's one of the most important things to consider in multi-threaded programming!

Sopiro
Автор

Imagine having a sponsor that not just sponsores "this portion of the video" but the whole video, the whole series even. It's a sign of quality and trust these days.

PostNoteIt
Автор

Thank you for this series. I have implemented real time raytracing from scratch, both on CPU and GPU and still there are a lot of valuable ideas I can get from you.

onecoding
Автор

I totally forgot about thread_local. its that kind of thing that you always forget but makes a huge impact in multi threading performance

Mempler
Автор

id advice to go for a custom prng, choosing one that is alot more modern over the mersenne twister engine used by the standard library, its just a bad outdated engine and recalculating the 2.5kB state of the engine on every single pixel is just as bad, we have far more modern prng's with tiny states andeven better rng results (tho slightly at the cost of periods which dont really matter for raytracing anways)
So id just go with abandoning the mt engine for the sake of reducing the state from a few kilobytes to mearly at most 32bytes, going with something like pcg64, xorshiro or splitmix64 as those are easily able to generate gigabytes of random data in seconds

LimaXv
Автор

I would be super interested in video about CPU caches and architecture

AVVI
Автор

How can someone become this good at explaining things?

tweakedsam
Автор

I don't like writing huge essays in a youtube comment section, but this was an amazing video! (Even with your camera overheating midway lmao)

hztcpfx
Автор

Thanks man this tools is op, i already tried it today, broguht my fps from 1k to 3k up <3

yni
Автор

The parallelised part of the code is only 90% of the CPU time, so the theoretical minimum frametime for a frame (which takes 60ms when single threaded) is -9.375ms - which you've pretty much reached. I'm interested to see what angle you attack it from next.- approximately 6.6ms

faizahmed
Автор

This video was an absolute masterpiece. 👏

mjthebest
Автор

I also had similar problem in my volume rendering app. Every single thread was writing to the shared variable in ray-casting loop. Solution was simple, I just added one condition. Instead of: "used = true" I just writed "if(!used) used = true; Now my program is 4 times faster than before.

jansvoboda
Автор

As soon as I saw that generating random numbers was slowing it down, I knew what the problem was - all the threads are using the same RNG!

JakobKenda
Автор

A easy way of solving this is using a more simple Pseudo random generator. Maybe a couple of xor etc so there is no state memory involved.

mr.mirror
Автор

Learn't about thread_local here - thanks - my argument before learning about it while watching this video was to have an instance of the rng for each object that uses it. Example, in the case of a pixel object, it would have a member rng class instance. Interesting that we can just declare it as thread_local and it will be taken care of like that - thanks for sharing and I will look into this! Cheers!

duncancamilleri
Автор

Mersenne twister is an outdated pseudo random number generator. It has good proven mathematical properties but isnt particularily fast and takes an enormous amount of memory.
There are faster lighter PRNGs available today such as the xoshiro/xoroshiro family, the PCG family, the Romu family...
Most of them are well documented and often tackle the problem of multithreading.

romaing.