Think Parallel - Bryce Adelstein Lelbach - ACCU 2024

preview_player
Показать описание
---

Think Parallel - Bryce Adelstein Lelbach - ACCU 2024
---

By default, we think sequentially. Parallelism in C++ is often seen as challenging and complex. Tools to be used sparingly and cautiously, and only by experts.

But we must shatter these assumptions, for today, we live in a parallel world. Almost every hardware platform is parallel, from the smallest embedded devices to the largest supercomputers.

We must change our mindset. Anyone who writes C++ code has to think in parallel. Parallelism must become our default.

In this example-driven talk, we will journey into the world of parallelism. We'll look at four algorithms and data structures in depth, comparing and contrasting different implementation strategies and exploring how they will perform both sequentially and in parallel.

During this voyage, we'll uncover and discuss some foundational principles of parallelism, such as latency hiding, localizing communication, and efficiency vs performance tradeoffs. By the time we're done, you'll be thinking in parallel.

Sponsored By think-cell
---

Bryce Adelstein Lelbach

Bryce Adelstein Lelbach has spent over a decade developing programming languages, compilers, and software libraries. He is a Principal Architect at NVIDIA, where he leads HPC programming language efforts and drives the technical roadmap for NVIDIA's HPC compilers and libraries. Bryce is passionate about C++ and is one of the leaders of the C++ community. He has served as chair of INCITS/PL22, the US standards committee for programming languages and the Standard C++ Library Evolution group. Bryce served as the program chair for the C++Now and CppCon conferences for many years. On the C++ Committee, he has personally worked on concurrency primitives, parallel algorithms, executors, and multidimensional arrays. He is one of the founding developers of the HPX parallel runtime system.
---

The ACCU Conference is the annual conference of the ACCU membership, but is open to any and all who wish to attend. The tagline for the ACCU is 'Professionalism in Programming', which captures the whole spectrum of programming languages, tools, techniques and processes involved in advancing our craft. While there remains a core of C and C++ - with many members participating in respective ISO standards bodies - the conference, like the organisation, embraces other language ecosystems and you should expect to see sessions on C#, D, F#, Go, Javascript, Haskell, Java, Kotlin, Lisp, Python, Ruby, Rust, Swift and more.The ACCU Conference is a conference by programmers for programmers about programming.
Discounted rates for members.
---

#accuconf #parallelism #concurrency #programming #cplusplus
Рекомендации по теме
Комментарии
Автор

28:21 [slide 104] Although slide 35 (at 10:30) claims that the previous approach takes O(input) storage, the code behind it shows the ‘locals’ variable only taking num_tiles elements (each element is less than half the size of the scan_tile_state object). Great talk nonetheless!

Roibarkan
Автор

38:57 [slide 142] note that I think it’s ok to skip the for_each() call that adds ‘pred’ to every element in ‘indices’, and instead add ‘pred’ to each index inside the final output lambda (e.g. if (flag) out[pred+index] = e;)

Roibarkan
Автор

@16:10 the verbal commentary does not match the shown code, concerning a relaxed fetch_add(). Maybe you mean the load stage is relaxed but did not equally emphasis the store stage is not relaxed. Since there is a memory order type with the relaxed word in the label this creates further confusion to the listener.

In this situation I think you should have a acq_rel

A relaxed operation indicates you don't care for strict increment only that the RMW cycle is done with integrity such that value corruption can not occur.

A release operation Indicates that you care that other thread receive visibility of the new data ( if they ask to check visibility via use of acquire) using release says you don't care if the initial value you use to perform add operation is out of date because you are not going to check with the memory and will reuse a stale value from a nearer cache if available

If I understand the operation correctly it wants both strong visibility to load fresh data from other thread and it wants to store with strong visibility to notify interested parties a change was made. So this is why I state acquire_and_release aka a acq_rel should be used.

Maybe on intel x86 the above works for you but it does not seem portable. If I understand the specific application correctly where you want the id to be unique and contiguous and non-overlapping across all the threads. The current code as written would create an incremental number from each threads viewpoint but some updates maybe lost and some threads maybe allocated a lower number than expected causing out of sequence tile ordering to be observable from a supervisor thread using acquire to load. If you can get the jump to be larger than your restricted worker resource pool you maybe deadlock for the reasons stated earlier in the presentation.

Good presentation and visuals thanks for your efforts.

DM-fwsu
Автор

cool talk, thanks, although this shit went over my head 😆😕🤯 definitely gotta re watch it lol

joshnjoshgaming
Автор

Slide 203 highlights the wrong column containing false.

MalcolmParsons
Автор

Watch Guy Steele's Talk on this subject instead

chadrickroper
visit shbcf.ru