CppCon 2019: Marshall Clow “std::midpoint? How Hard Could it Be?”

preview_player
Показать описание



The standards committee adopted "P0811: Well-behaved interpolation for numbers and pointers" for C++20.
It includes a new library call `std::midpoint`.
The paper says "The simple problem of computing a value between two other values is surprisingly subtle in general."

In this talk, I will explore this simple call, provide a history of the development in libc++, and show some of the pitfalls.
Undefined behavior will rear its ugly head, along with numeric representations, and the arcane C promotion rules.

Along the way, we'll talk about testing, and why writing extensive tests helps everyone.

Marshall Clow
C++ Alliance
Engineer
Marshall has been programming professionally for 35 yearsHe is the author of Boost.Algorithm, and has been a contributor to Boost for more than 15 years. He is the chairman of the Library working group of the C++ standard committee. He is the lead developer for libc++, the C++ standard library for LLVM.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

I think an important (and pretty obscure) detail is missing when working with floats, Icite from wikipedia:

"the internal registers in the 8087 were designed to hold intermediate results in an 80-bit "extended precision" format. The 8087 automatically converts numbers to this format when loading floating-point registers from memory and also converts results back to the more conventional formats when storing the registers back into memory."

So the function actually returns an 80bit floating point that will be truncate somewhere in the future. If you are taking the result and perform some other operations, the compilier might decide to keep operating on the 80bit register and depending on compiler optimization you will get different results!

It can get worse: the same function can be inlined or not, depending on what operations will be done with the result, you don't know when the result will be truncated.

You can either add a volatile (AARRGGHHH) keyword, disable the 80bit register (AARRGHHH) or just realize the truth: there is no spoon.

ponchietto
Автор

Guy spends an hour explaining how not to write a buggy midpoint implementation.
Comments flooded with buggy midpoint implementations.

jamesflames
Автор

If any of the std algorithms uses a midpoint of iterator, it would be good to provide that since the code exists anyway. The binary_search comes to mind. If that (for example) only likes random access iterators, then having the same limitations on the midpoint function would be fine.

JohnDlugosz
Автор

Just starting this talk, and I'm sure to love it as I have enjoyed Marshall Clow's other talks. But I'm slightly hung up on the comment at 8:43 that ints and floats are “not numbers” and not “not math” and I suspect is just a slip-of-the-tongue. Yes, they absolutely are numbers and math—they're just not the ones we'd sometimes like them to be. By pretending, as one might, that ints and floats model the fields of integers (𝐙, +, -, *, /) and reals (𝐑, +, -, *, /), we make a fatal mistake. The computer types violate key axioms of a field: ints and floats are bounded, so the operations will not close over the sets of values and this is "overflow". There are other issues (including associativity and the difference between a number and representation) but I don't want to digress from the point: these are still perfectly mathematical objects: they're just not the ones that fit the convenient axioms we are used to.

TimTeatro
Автор

This proves, once again, that the devil is really in the details... Thanks for the information...

SahinKupusoglu
Автор

Very nice and interesting talk, thank you.

ggabriel
Автор

Does this hold for non-nan floats: a < b => (midpoint(a, b) == a => (a == b) || (a.next() == b)) ?

movaxh
Автор

46:50 actually it could be done in single pass without counting the elements. Just move one iterator by one and another one by two in each iteration and when the other one reaches the end the first one is in the middle.

kubixus
Автор

for floats: abs(a) < lo, instead of abs(b) < lo wasn't a mistake? It is still in the version on github.

kirepudsje
Автор

what is the difference between the cmps at 29:40 ?

YumekuiNeru
Автор

I'm not convinced that the branch is necessarily worse than the cmov in this instance, and I suspect it depends strongly on the use case. If the branch predictor is able to speculate correctly a decent fraction of the time, the branchful version will be faster.

It kind of sucks -- this function is indeed an obviously good candidate for standardization, but there are lots of little decisions that actually matter and which have no clear answer.

isodoublet
Автор

For integers you could just use Don Knuth's midpoint algorithm: (a & b) + ((a ^ b) >> 1)

alexandernyberg
Автор

Since the midpoint of two T* can't be used as a T*, is there a use case for midpoint of pointers?

abuyoyo
Автор

never knew that denormalized numbers fail "<= hi" check

NoNameAtAll
Автор

still for integers(and if casted for any pointers too, but not sophisticated iterators) i would prefer simpler version:
//return (a>>1) + (b>>1) + (((a&1) + (b&1))>>1); //sorry before edit
return (a>>1) + (b>>1) + (a&b&1);

zdnuijbsodfteu
Автор

For integers, I bet a/2 + b/2 + (b & 1) works great.

Omnifarious
Автор

Haven't you guys found a more serious problem to talk about for one hour?

ArtyomPalvelev