Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

preview_player
Показать описание
---

Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

In this session we will construct a Single Producer Single Consumer Lock-free Wait-free Fifo from the ground up. But why write our own if we can get one from reliable sources such as Boost.Lockfree? There are a couple of answers to this question.

* Writing such a fifo is a fairly gentle introduction to lock free programming.
* There are some interesting performance optimizations that can be made.
* There may be some specific requirements that are not met in out-of-the box implementations.

In the presentation we will first develop a simple circular fifo. Next we will make it thread-safe as well as lock-free and wait-free. After that we will address issues associated with cache coherency and false sharing in particular. We will show a few additional optimizations that can be added as needed to meet specific requirements. Finally, the performance of our fifo will be compared with a few out of the box implementations. Along the way we will touch on subjects such as thread safety, data-races, false sharing, object lifetime, and relaxed atomics.
---

Charles Frasch

Charles Frasch is a Senior Core Developer with the IEX Group where he is working to re-platform their core trading infrastructure in modern c++. Charles has worked in the financial technology world for more than twenty years in areas such as high-frequency trading and low-latency, high-performance infrastructure.
__

---

#cppcon #cppprogramming #cpp
Рекомендации по теме
Комментарии
Автор

I've probably given this tip to more people than any other, but if you're optimizing around an array of any variety, use a size that's a power of two. If you need to constrain an index into such an array, a `bitwise and` is the fastest way to do so. Never use division/modulus and oddly sized arrays, even when setting up the backing store for a hash table. Prime number sizes may yield better distribution, but it's always at the expense of performance. Also, with regards to hash tables, always store the unnormalized hash with the keys so that when you need to resize the table you don't need to query the hashing function again to rebuild the entire table.

anon_y_mousse
Автор

Great talk. We need more of this type of content in C++ talks!

fabiogaluppo
Автор

41:00 I had that issue too.
My solution was to not store objects in the queue, but just store pointers.
the new/delete are done outside. I used another fifo of "Available buffers", pre-allocated pointers that can be reused. Instead of new grab one from it, instead of delete push it back to be reused.
That way I get rid of the push/pop memcpy's, and ALSO of runtime allocations.
Problem is just that you have to handle when the push fails and the queue is full, you need to either throw the pointer back into the available buffers, or hold onto it until it can be pushed.

DedmenMiller
Автор

One of the best explanations for implementing a lock-free SPSC I've come across. Kudos!

natedoggstyle
Автор

Far better than my implementation which just used a try lock and gave up if it couldn't immediately do the action that it wanted to do!

georgeridout
Автор

I’m glad I’m not the only one struggling to understand memory ordering 😂
To understand this talk fully, I read chapter 5 from the book “C++ Concurrency in Action” (2nd edition), which goes into depth on the topic of memory ordering. Had to read it twice to understand everything, but it was worth it.
Thank you for this excellent talk!

zmweb
Автор

I completely misunderstood Roi's question about 1:03. Yes, I agree that if you grab multiple pushers or poppers in the same block they will be destructed in the reverse order in which they were acquired. And, if you do that you must carefully consider that fact. The pusher and popper destructors just advance the cursor. So, it is possible that at the end of the block the cursors are correct. But, my real answer is, if you need to read or write multiple values from and to the FIFO you should extend the API to add functions that do that correctly.

CharlesFrasch
Автор

Wow, what a marvelous presentation! I enjoyed every single second of it. I have always wanted some similar learning material to start with lock-free programming, and this one was just the right one. Thank you very much, Mr. Frasch.

amaama
Автор

So many good things explained so well in one go. Bravo.

starchsky
Автор

Me a noob thinking I was optimizing by messing with the std lib and working around those. This talk expanding my understanding of how you can truly optimize a system at a deep level.

ferinzz
Автор

I am sure I have missed it but in bool Fifo5::push(T const& value), how does
*pusher = value; end up invoking the pusher::get() when no access to the underlying Fifo5 object has been provided (operator*()) to allow access

tulipshaar
Автор

One way to handle a pointer request when near the end of the FiFo wrap around point is to allocate extra space past the main circular buffer. Then the get pointer function can in the rare cases where the buffer wraps around to do a copy into this extra space so that the caller has a linear buffer. and then the release can copy data back into the FiFo buffer. The only problem is when the FiFo can auto expand the buffer. In that case a mutex is needed and all callers can't be holding pointers. Luckily in my case I don't need lock free and the code just grabs a mutex for any operation and also keeps the mutex held between the getting of the pointer and the release of the pointer. I also use RAII in that my get pointer returns a class to release the pointer and the mutex.

LaserFur
Автор

Great talk! Just a warning though, it is not a working FIFO, once pushCursor will get to the end of max size_t, it will start overwriting the buffer at [0] and on wards.

AlexandreA-wc
Автор

slide 28, in push function, shouldn't we load popCursor with acquire first, and then load pushCursor with relaxed, correct me if I misunderstand this.

JiaChen-qp
Автор

That gentleman looks so much like the cousin of Bjourne, cool presentation

ksvishvajith
Автор

Don't we have to compare/exchange in order to get thread safety here?

rocknroooollllll
Автор

12:07 While this would work for most, this is not correct [when capacity is not 2's power]. It relies on size_t not overflowing.
(Which is a safe bet when it's 64 bit since it would take hundreds of years, but it is NOT when it's 32 bit or less, for example when running on a 32 bit or 8 bit microcontroller, where it only takes minutes to overflow.)

To demonstrate the problem create a FIFO with 10 capacity and set the cursors to (or push and pop that many times), then push 10 items into the FIFO (compiled to a 32 bit platform). The write index will be 0, 1, 2, 3, 4, 5, 0, 1, 2, 3. Meaning even though the FIFO was initially empty and had enough space to hold 10 items, it overwrites 4 of those items... and it will try to destruct some items twice.

Also I usually don't like using modulo operation for indexing. Because capacity is not a compile time constant modulo division won't be optimized which could be a problem (mostly on processors without hardware division unit) (even when capacity is 2's power). I'd rather limit the cursor to valid index range / capacity to make sure it at all times contains actual valid array index (which makes debugging easier). (of course you would need to differentiate between full and empty in another way)

I have written a number of lock free FIFOs for microcontrollers over the decades, so it seemed trivial to me that this is possible (even though the implementation might not be trivial, and may include memory barriers) I also realize that it might not be obvious for "modern"/young developers. I still wondered what could be said about them to fill out a 1 hour long presentation...
So I still watched it (on 2x speed) to see if he ever addresses this overflow issue, and the talk did contain some interesting pitfalls. This shows that multi-threading and modern processors are complicated, so nearly everyone should use standard libraries instead of rolling their own FIFO.

szirsp