Introduction to C++ Atomic Variables in Multithreaded Programming

preview_player
Показать описание
A quick intro to C++ atomic variables and why you might want to use them when writing multithreaded code (or why you might NOT want to use them).

You should definitely check out this CppCon video to get a full explaination of Atomics from the very basic right up to the advanced level.
CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?”

Here is the compiler explorer link to the very simple program that shows the copy, modify and write operation.
Рекомендации по теме
Комментарии
Автор

No offense to the Indians; however, finally a clear instructional video from someone who speaks English. Keep up the good work.

alexb
Автор

A very simple and very easy to understand tutorial! Thanks.

njitgrad
Автор

Thanks, great explanation and healing voice!

cafelashowerezweb
Автор

Good topic and example. I just have a couple of points. The vast majority of programmers will be using Intel or AMD 32-bit x86/IA-32 or 64-bit i64/x86-64 architecture processors (x86 from now on). The Intel instruction set enables atomic semantics on fundamental data types - those which fit into the word size of the processor and which can be fetched or stored in a single memory access, such as bool, char, word (short), dword (int/int32 on x86-32), qword (long/int64 on i64) and so forth by asserting the lock prefix on a single instruction that performs the atomic operation such as add, sub, inc(rement), dec(rement) and some other more esoteric instructions such as compare-exchange. Instead of the loop that you described an atomic add, for example, in a fully optimized build of a C++ program will do the following on an x86 architecture machine:

lock add [variable], value

Various (ordinary x86) forms of addressing are available to the locked instruction but basically we're addressing memory and adding an immediate or register value. Without the lock prefix the processor would not synchronize operations with other processors and the data value could be corrupted (although IIRC reads of fundamental types on x86 are inherently interlocked). The lock prefix locks the data bus between processors, or in modern processors cores, to stop any other processors in the system accessing the locked address range. The locked instruction takes the same amount of time as the unlocked instruction, plus the time required to fetch and decode the lock prefix which is a single byte operation. On modern processors the amount of extra time is so negligible as to be virtually unmeasurable. So on an x86 machine there would be virtually no difference in time between a thread performing a locked add and one performing the unlocked version. There would be a very small hit at hardware level but it will likely be entirely hidden by caching and memory management at CPU level. You could run an experiment by using the chrono high performance counter and doing say a billion adds with a normal and a locked integer variable and comparing the results.

You mentioned mutexs which are more sophisticated locking primitives and can cause a context switch (the processor will execute some other runnable thread or process) if this thread cannot acquire the mutex. This really will lead to measurably worse performance. However, a single mutex won't lead to a deadlock situation (meaning no threads can run because they're all holding a lock primitive such as a mutex and simultaneously trying to acquire one already held by a different thread).

Lastly, the example that you gave of the volatile add gave a good example of the sort of behavior that will happen on a register-based load/store architecture machine like an ARM processor but not an x86. It wasn't entirely accurate since there was no comparing the added value with the expected result and looping until it became indivisibly set. Volatile tells the C++ compiler that the value of a variable might change between accesses so that it can't be stored in a register or aliased but must be accessed specifically for every operation each time. So the traditional Intel add instruction which does the read-add-write operation in a single instruction is subdivided into three separate instructions. This is how it would appear on the ARM or some other RISC architectures but not Intel, unless you were manipulating an atomic variable that was not a fundamental type and then it might use a mutex or spinlock. FYI, use of the volatile keyword is discouraged in modern variants of the language and is scheduled to be deprecated in newer versions (C++20 and on).

A good understanding of the processor architecture and instruction set can be garnered from Intel x86 Programming Manual, freely available on the Intel site (ARM documentation not nearly as detailed in my experience).

treyquattro
Автор

Thank you for the example! It would have been cool to also possibly inspect the generated assembly to reaaally get a feel of how the compiler is treating the atomic variable differently, but I know that's quite involved. Thanks for the link to the video as well, I'll be off to that now.

abdullaalmosalami
Автор

Very clear. This video makes me think multi-threading isn't that hard after all. Very nice!

ThePaullam
Автор

-I used to do a little coding my self, .... if you want to use multi-threading i suggest atomic variables
- wait that works ?
-Yes that's why i suggested it








That's why I'm here 😂

mrreese
Автор

Finally clear English, what a relief....)))

stansem
Автор

Nice introduction to atomics !
However, std::atomic can be an order of magnitude faster than mutexes because it's implementation doesn't always rely on locks. Depending on the size of the atomic value and the Hardware capabilities you can have pretty fast atomics in fact. They're slower but still way faster than mutexes

Hadrazazel
Автор

Very nice explanation. It would be nice if you could make video on memory_order

Srkulkarni
Автор

You're excellent ! Outstanding post !

astral_md
Автор

Is the benefit of multithreading not canceled by the overhead of atomic?

alltheway
Автор

touch on memory ordering, CPU cache coherence and such

spicy_wizard
Автор

Sorry, why adding random numbers returns always the same result?

alltheway
Автор

Thanks for the Great video! What if we created sum1, sum2, sum3 for each thread and add up the sums then print. Does it improve the performance and prevent incorrectness which we got while using normal long? Sry for the noob question 😂

YahyaRahimov
Автор

Dumb question. Would it be a problem if multiThreadSum is accessed by multiple threads at the same time? I mean I read that we cannot access a memory location at the same time by multiple threads.
Thanks :)

csmellow
Автор

passing const <className>& to a thread will not pass it as a reference, you need to use std::ref to do so

ahmadalastal
Автор

Please increase the font size. It’s very difficult to read on a small screen.

YouLilalas
Автор

Would be nice if you used modern C++ (std::accumulate, std::rand, std::generate_n)

majestif
Автор

I had to make more threads(5-8) to start getting different numbers from time to time, your program seems to always generate different numbers just for number_of_threads=3, why do such things happen?I have 4 cores processor. btw good video

JaSamZaljubljen