CppCon 2018: JF Bastien “Signed integers are two's complement”

preview_player
Показать описание


There is One True Representation for signed integers, and that representation is two’s complement. There are, however, rumors of a fantasy world—before C++20—where ones' complement, signed magnitude and "pure binary representations" dwell. That world boasts Extraordinary Values, Padding Bits, and just like our world it hosts swaths of Undefined Behavior.

Join me in exploring this magnificent fantasy world, and discover its antics. Together we'll marvel at how the other representations were forever banished from real-world C++, doomed to cast mere shadows onto our reality.

JF Bastien, Compiler engineer
Apple

JF is a compiler engineer. He leads C++ development at Apple.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

Post-CppCon update! The final approved wording for C++20 is present in P1236R1 (as voted by the committee on November 2018 in San Diego). It has math-y wording (instead of my engineering wording), leaves a bit more implementation freedom for bool, and doesn't resolve LWG3047 atomic compound assignment (Library will resolve it separately for C++20, including resolving the same issue in atomic_ref and atomic<T*>).

Bourg
Автор

55:30 As an American, there is no need to apologize. That is the sane way to write dates.

Nice to see the practical reality (that integers are 2's complement) and the spec align.

fdwr
Автор

58:18 Bool always guaranteed to be 0 or 1 may seem at first insignificant, but it's a great feature, you can now write:

out_of_bounds_count += (value > Max_value); // comparisons always return type bool

which compiles to straight-line code, rather than

if (value > max_value)
out_of_bounds_count++;

which (unless the compiler is Really Smart, often true but not always) compiles to branching code which on modern processors may clear the instruction cache, slowing things down substantially. The first expression is also quite easy to read, once you see that the comparison returns 0 or 1.

TranscendentBen
Автор

There's so much here ... I learned sign-magnitude, ones and twos complements in the late 1970s, and from the 8 and 16 bit microprocessors at the time, it was clear things were leaning toward twos complement. I learned (or started learning) C in 1986 and indeed by then virtually everything used twos complement. I knew that things such as referencing a null pointer was "undefined behavior" but not that signed integer overflow was. I forget when I finally learned that but it was years or decades later. It always bothered me because I KNOW what the equivalent assembly/machine code does, and I saw the purpose of a compiler was to generate equivalent code to the source code, and overflow was a natural occurrence of exceeding the bounds of the integer size, and everyone knows what the twos complement result will be.

That's another thing, somewhere along the way - "integer" size was "clearly defined" by what original C standards there were as the size of the register word, but at least 16 bits (thus compilers generated 16 bit code even for 8 bit processors), and it commonly became 32 bits as compilers targeted newer 32 bit processors. C99 introduced the int8_t, uint8_t, int16_t, uint16_t etc. types and I thought to myself 20 years ago, why do people still use int? If you're not sure what processor you're targeting (my career has been embedded, so it could be 8-bit to 32-bit wordlength), or you're targeting several(!), you don't know how big an int is! So I started using the new types exclusively, so I always know variable size at a glance.

Interesting that you mention MATLAB doing saturation, but that's also the standard operation for most DSPs. I was reading in the late 1990s how "mainstream" processors were adding DSP instructions (such as MAC, multiply-accumulate, multiply two numbers and add the product to a register), and as I recall, they may have been adding a saturation mode as well. Saturation is a much more appropriate way (better approximation to what the signal "should be") to handle overflow than "wraparound" in signal processing. Of course saturation is NOT part of any C or C++ standard that I've heard of, yet C and C++ are used almost exclusively for DSP programming. Programmers just know and accept that that's how DSPs work.

But I can (now) see where different people expect certain things, and the standard committees have to somehow take these things into account. I've read such things about Microsoft Windows, people wrote production code that called Win system functions with wrong values, but the code still did something useful, and rather than "fixing" things MS has to make sure newer versions of Windows still work with such improper calls so that older apps don't break.

TranscendentBen
Автор

3:22 how's that supposed to work? What about overflows(2, -1)? Don't consider UB at this stage. That code won't work in the first place.
6:40 won't that give a compiler warning since an unsigned is compared to a signed?

thomasweller
Автор

I think it is good that signed integers are being defined as two complements, but that is not going to solve the signed integer overflow being undefined by itself. 2-compleness was always implementation defined, and is just a crust worth removing from the standard. There were no machines using it for last 30 years. Maybe there were some emulators for PDP-11 using it, but that it is all. If you want old stuff on this machines (and there is probably less than 10 people using it), just stick to old compiler version. Done.

movaxh
Автор

18:50 Atomic Ghandi was pretty much disproven by the developers. They just did not read state like that in a way that overflow could have mattered. Cool story, just sadly not real.

seditt
Автор

12:25 Damn. What a prophet. I did watch this video, checked how is D compiler dealing with this on my platform, and it did actually good on "obvious" code (primary reason is that D has defined behavior on integer overflow and defined integer representation), but not so good on "workaround" cases. So yes, I did fill some bugs to gcc and llvm. :D Fortunately I can use __builtin_sadd_overflow in gdc very easily, and yes it does optimal code (especially after inlineing).

movaxh
Автор

So, does this mean one can now rely on and assume 2's complement implementation after this passes the commitee?

User-cvee
Автор

54:10 _what_? why on earth would you consider "char" to be signed, given that it in practice it means "a byte from an utf-8 string or maybe a string that uses some legacy 8-bit encoding"?

MatthijsvanDuin
Автор

5:32 how is that code working? If lhs is INT_MAX and rhs is 1, you'll end up with an unsigned int with the value of "INT_MAX +1", which is roughly UINT_MAX/2, and isn't less than INT_MAX. So you're not detecting overflow from positive to negative ints, are you?

obfuscator
Автор

Wouldn't having both unsigned and signed overflow be UB break some std::hash algorithms?

iddn
Автор

good evidence out of the math guys that sum infinite series that twos complement is more fundamental than programmers realize

styleisaweapon
Автор

Modern Use of Something Other Than 2's Complement (and it's not just MATLAB):

TranscendentBen
Автор

So, if you make the storage 2 complement, but integer owerflow is still ub, then you cannot really rely on the fact that addition wraps on overflow? So I do not really see the point... Unless you do the addition yourself, but then that is way less expressive than writing a+b or using builtins.


I do not see really the point of many suggestions. The overflow thing for example: the only example it would fix is that the overflow check (which to me is too weird anyways, much more expressive to cast to unsigned and then check) would be nicer and a bunch of infinite loops would disappear due to optimization, but is it really worth it?


In the end, if I write something like (a+b) < a for natural numbers, I just wrote a statement which is always false for positive b, and integers are supposed to represent integer numbers. So the overflow check at the beginning is just madness to me. Because you are reasoning in terms of internal storage, instead of what an integer is supposed to represent...

filippol
Автор

If volatile goes away, how does memory mapped IO work?

timothymusson
Автор

sizeof(void *) == 8: is this implying that c++ is not for use in the embedded (32 bit) world?

cbehopkins
Автор

EDIT: EVERY fing I write a comment, and like magic, it's addressed a minute after XDXD

Regarding overflow... It's insane people would have to write code to check it. I remember from school, CPU will _tell you*_ in a register. Shouldn't there be a built in way to check?
Some kind of "add with check" like:
add a b
rslt = eax
didOverflow = <bit from special register with info about last operation>

... I always assumed this is how it is implemented...

* and I remember the nice diagram, showing the carry bit setting the overflow flag

Verrisin
Автор

I would like to have fixed point representation for numbers between 1 and 0. Seems both very fast, and relevant now for neural networks.

NicolayGiraldo
Автор

better solution - we sell a cheaply available CPU that uses 1s complement and one that uses sign-magnitude (like a the raspberry pi) so that people can test and fix their non-portable code. if code does not work on a big-endian, sign-magnitude machine it is broken.

ssl