CppCon 2018: JF Bastien “Signed integers are two's complement”

Показать описание

—
—
There is One True Representation for signed integers, and that representation is two’s complement. There are, however, rumors of a fantasy world—before C++20—where ones' complement, signed magnitude and "pure binary representations" dwell. That world boasts Extraordinary Values, Padding Bits, and just like our world it hosts swaths of Undefined Behavior.

Join me in exploring this magnificent fantasy world, and discover its antics. Together we'll marvel at how the other representations were forever banished from real-world C++, doomed to cast mere shadows onto our reality.
—
JF Bastien, Compiler engineer
Apple

JF is a compiler engineer. He leads C++ development at Apple.
—

*-----*
*-----*

Рекомендации по теме

Комментарии

Post-CppCon update! The final approved wording for C++20 is present in P1236R1 (as voted by the committee on November 2018 in San Diego). It has math-y wording (instead of my engineering wording), leaves a bit more implementation freedom for bool, and doesn't resolve LWG3047 atomic compound assignment (Library will resolve it separately for C++20, including resolving the same issue in atomic_ref and atomic<T*>).

Bourg

55:30 As an American, there is no need to apologize. That is the sane way to write dates.

Nice to see the practical reality (that integers are 2's complement) and the spec align.

fdwr

58:18 Bool always guaranteed to be 0 or 1 may seem at first insignificant, but it's a great feature, you can now write:

out_of_bounds_count += (value > Max_value); // comparisons always return type bool

which compiles to straight-line code, rather than

if (value > max_value)
out_of_bounds_count++;

which (unless the compiler is Really Smart, often true but not always) compiles to branching code which on modern processors may clear the instruction cache, slowing things down substantially. The first expression is also quite easy to read, once you see that the comparison returns 0 or 1.

TranscendentBen

There's so much here ... I learned sign-magnitude, ones and twos complements in the late 1970s, and from the 8 and 16 bit microprocessors at the time, it was clear things were leaning toward twos complement. I learned (or started learning) C in 1986 and indeed by then virtually everything used twos complement. I knew that things such as referencing a null pointer was "undefined behavior" but not that signed integer overflow was. I forget when I finally learned that but it was years or decades later. It always bothered me because I KNOW what the equivalent assembly/machine code does, and I saw the purpose of a compiler was to generate equivalent code to the source code, and overflow was a natural occurrence of exceeding the bounds of the integer size, and everyone knows what the twos complement result will be.

That's another thing, somewhere along the way - "integer" size was "clearly defined" by what original C standards there were as the size of the register word, but at least 16 bits (thus compilers generated 16 bit code even for 8 bit processors), and it commonly became 32 bits as compilers targeted newer 32 bit processors. C99 introduced the int8_t, uint8_t, int16_t, uint16_t etc. types and I thought to myself 20 years ago, why do people still use int? If you're not sure what processor you're targeting (my career has been embedded, so it could be 8-bit to 32-bit wordlength), or you're targeting several(!), you don't know how big an int is! So I started using the new types exclusively, so I always know variable size at a glance.

Interesting that you mention MATLAB doing saturation, but that's also the standard operation for most DSPs. I was reading in the late 1990s how "mainstream" processors were adding DSP instructions (such as MAC, multiply-accumulate, multiply two numbers and add the product to a register), and as I recall, they may have been adding a saturation mode as well. Saturation is a much more appropriate way (better approximation to what the signal "should be") to handle overflow than "wraparound" in signal processing. Of course saturation is NOT part of any C or C++ standard that I've heard of, yet C and C++ are used almost exclusively for DSP programming. Programmers just know and accept that that's how DSPs work.

But I can (now) see where different people expect certain things, and the standard committees have to somehow take these things into account. I've read such things about Microsoft Windows, people wrote production code that called Win system functions with wrong values, but the code still did something useful, and rather than "fixing" things MS has to make sure newer versions of Windows still work with such improper calls so that older apps don't break.

TranscendentBen

3:22 how's that supposed to work? What about overflows(2, -1)? Don't consider UB at this stage. That code won't work in the first place.
6:40 won't that give a compiler warning since an unsigned is compared to a signed?

thomasweller

I think it is good that signed integers are being defined as two complements, but that is not going to solve the signed integer overflow being undefined by itself. 2-compleness was always implementation defined, and is just a crust worth removing from the standard. There were no machines using it for last 30 years. Maybe there were some emulators for PDP-11 using it, but that it is all. If you want old stuff on this machines (and there is probably less than 10 people using it), just stick to old compiler version. Done.

movaxh

18:50 Atomic Ghandi was pretty much disproven by the developers. They just did not read state like that in a way that overflow could have mattered. Cool story, just sadly not real.

seditt

12:25 Damn. What a prophet. I did watch this video, checked how is D compiler dealing with this on my platform, and it did actually good on "obvious" code (primary reason is that D has defined behavior on integer overflow and defined integer representation), but not so good on "workaround" cases. So yes, I did fill some bugs to gcc and llvm. :D Fortunately I can use __builtin_sadd_overflow in gdc very easily, and yes it does optimal code (especially after inlineing).

movaxh

So, does this mean one can now rely on and assume 2's complement implementation after this passes the commitee?

User-cvee

54:10 _what_? why on earth would you consider "char" to be signed, given that it in practice it means "a byte from an utf-8 string or maybe a string that uses some legacy 8-bit encoding"?

MatthijsvanDuin

5:32 how is that code working? If lhs is INT_MAX and rhs is 1, you'll end up with an unsigned int with the value of "INT_MAX +1", which is roughly UINT_MAX/2, and isn't less than INT_MAX. So you're not detecting overflow from positive to negative ints, are you?

obfuscator

Wouldn't having both unsigned and signed overflow be UB break some std::hash algorithms?

iddn

good evidence out of the math guys that sum infinite series that twos complement is more fundamental than programmers realize

styleisaweapon

Modern Use of Something Other Than 2's Complement (and it's not just MATLAB):

TranscendentBen

So, if you make the storage 2 complement, but integer owerflow is still ub, then you cannot really rely on the fact that addition wraps on overflow? So I do not really see the point... Unless you do the addition yourself, but then that is way less expressive than writing a+b or using builtins.

I do not see really the point of many suggestions. The overflow thing for example: the only example it would fix is that the overflow check (which to me is too weird anyways, much more expressive to cast to unsigned and then check) would be nicer and a bunch of infinite loops would disappear due to optimization, but is it really worth it?

In the end, if I write something like (a+b) < a for natural numbers, I just wrote a statement which is always false for positive b, and integers are supposed to represent integer numbers. So the overflow check at the beginning is just madness to me. Because you are reasoning in terms of internal storage, instead of what an integer is supposed to represent...

filippol

If volatile goes away, how does memory mapped IO work?

timothymusson

sizeof(void *) == 8: is this implying that c++ is not for use in the embedded (32 bit) world?

cbehopkins

EDIT: EVERY fing I write a comment, and like magic, it's addressed a minute after XDXD

Regarding overflow... It's insane people would have to write code to check it. I remember from school, CPU will _tell you*_ in a register. Shouldn't there be a built in way to check?
Some kind of "add with check" like:
add a b
rslt = eax
didOverflow = <bit from special register with info about last operation>

... I always assumed this is how it is implemented...

* and I remember the nice diagram, showing the carry bit setting the overflow flag

Verrisin

I would like to have fixed point representation for numbers between 1 and 0. Seems both very fast, and relevant now for neural networks.

NicolayGiraldo

better solution - we sell a cheaply available CPU that uses 1s complement and one that uses sign-magnitude (like a the raspberry pi) so that people can test and fix their non-portable code. if code does not work on a big-endian, sign-magnitude machine it is broken.

ssl

CppCon 2018: JF Bastien “Signed integers are two's complement”

CppCon 2018: JF Bastien “Signed integers are two's complement”

Just-in-Time Compilation - JF Bastien - CppCon 2020

*(char*)0 = 0; - What Does the C++ Programmer Intend With This Code? - JF Bastien - C++ on Sea 2023

CppCon 2016: JF Bastien “No Sane Compiler Would Optimize Atomics'

CppCon 2015: JF Bastien “C++ on the Web: Ponies for developers without pwn’ing users'

Deprecating volatile - JF Bastien - CppCon 2019

C++ Cryptozoology - A Compendium of Cryptic Characters :: #2 - Adi Shavit [ CppCon 2018 ]

Lightning Talk: Your Favorite Undefined Behavior in C++ - JF Bastien - CppNow 2023

CppCon 2018: Geoffrey Romer “What do you mean 'thread-safe'?”

CppCon 2018: “Implementing the C++ Core Guidelines’ Lifetime Safety Profile in Clang”

Engineering Software: integral types - Andrei Zlate-Podani [ CppCon 2018 ]

CppCon 2018: Billy O'Neal “Inside Visual C++' Parallel Algorithms”

CppCon 2018: James Bennett “Refactoring Legacy Codebases with LibTooling”

CppCon 2018: Richard Powell “How to Argue(ment)'

CppCon 2018: Chandler Carruth “Spectre: Secrets, Side-Channels, Sandboxes, and Security”

CppCon 2018: Richard Powell “Named Arguments from Scratch”

CppCon 2018: Phil Nash “You're Not as Smart as You Think You Are”

CppCon 2018: Jason Turner “Surprises in Object Lifetime”

CppCon 2018: “Secure Coding Best Practices: Your First Line Is The Last Line Of Defense (2 of 2)”...

CppCon 2018: Louis Dionne “Compile-time programming and reflection in C++20 and beyond”

CppCast Episode 179: San Diego EWGI Trip Report with JF Bastien

CppCon 2018: Christopher Di Bella “How to Teach C++ and Influence a Generation”

C++20 Concepts: A Day in the Life - Saar Raz - CppCon 2019

CppCon 2018: Steven Simpson “Source Instrumentation for Monitoring C++ in Production”

(char)0 = 0; - What Does the C++ Programmer Intend With This Code? - JF Bastien - C++ on Sea 2023