CppCon 2016: Chandler Carruth “Garbage In, Garbage Out: Arguing about Undefined Behavior...'

preview_player
Показать описание


There has been an overwhelming amount of tension in the programming world over the last year due to something that has become an expletive, a cursed and despised term, both obscene and profane: **undefined behavior**. All too often, this issue and the discussions surrounding it descend into unproductive territory without actually resolving anything.

In this talk, I'm going to try something very bold. I will try to utterly and completely do away with the use of the term "undefined behavior" in these discussions. And I will unquestionably fail. But in the process of failing, I will outline a framework for understanding the actual root issues that the software industry faces here, and try to give constructive and clear paths forward, both for programmers and the programming language.

And, with luck, I will avoid being joined on stage by any unruly nasal demons.

Chandler Carruth
Google
C++ Lead
San Francisco Bay Area
Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

Not gonna lie; hearing Chandler criticize the standard put a big ass smile on my face. I've lost count of the number of times I've tried to discuss both possible, and legitimate issues with languages with other developers over the years, only to have them fanboy up and refuse to admit there could be anything wrong with their perfect little language.

This isn't to say I was right in all cases, but the general stubbornness to admit there could possibly be an issue, or that something could be done better, has just absolutely driven me nuts over the years. So frankly it's refreshing to just hear someone else criticize one for a while, much less someone with as much weight and authority as Chandler.

jacob_s
Автор

Having Chandler in the committee gives me hope for the future of the language. Good talk as always!

kewqie
Автор

I guess the best thing for the "narrow vs wide contract" is compiler warnings: keeps C/C++ narrow(ness), but not without trying to save us.

MrAbrazildo
Автор

people are not upset because the effects of violations are latent (hence, write more sanitizers) but because the contracts themselves are stupid

UGPepe
Автор

47th slide is so confusing... "good" and "bad" words don't correspond to UB...

NoNameAtAll
Автор

x << y is DB for every platform even if it's different. we're all ok with that. C was never intended to program an abstract machine but actual hardware. we already have java for that.

UGPepe
Автор

36:36 this error is actually not related to overflows at all. it behaves *exactly* as expected: we allocate
16 + (0-1)*8 = 16 + (-1)*8 = 16 - 8 = 8
bytes. so basically, we say that since we don't want any of the 8-byte rtunions, we'll just not allocate any - not even the one in the latter 8 bytes of the 16-byte rtvec_def. which means we'll try to allocate 8 bytes for a 16-byte struct, which is of course an error. but the error is a completely logical error and is not related to overflows at all.

smiley_
Автор

Interesting talk. Example why one should prefer signed integers is really great.

I think, the key problem with UB (and why it is so hateful) is that compiler allowed not only do something terrible, but also not to do something. For instance:

for (int i = 0; i < 10; ++i) cout << i *

Here compiler can rely that integer overflow in i * will not happen, so i < 3, so compiler can safely remove loop condition check and make this loop infinite.

But if you are not aware of such cases, then WAT.

kostikvl
Автор


I don't see the presentation at the supplied location.

jmille
Автор

37:00 Doesn't the math actually work out here to produce the expected result? Chandler says that unsigned multiplication is defined as modular arithmetic, so the calculation should go as follows:


Promote -1 to unsigned -> 2^32 - 1 (or 2^64 - 1, doesn't matter)
(2^32 - 1) * 8 mod 2^32 = 2^32 - 8
8 + (2^32 - 8) mod 2^32 = 0


And 0 is the exact size you would expect when the input is n = 0.

kered
Автор

Why not make the shift more than nr of bits in the type a compile error then? Its super common for these to be compile time known. For the runtime case, make the operator autocast to a range type which has compiler configurable behaviour and can be either free and undefined if wrong, or excepts or performs modulo sizeof(type)*8?

MrMidjji
Автор

The example on slide 48 actually produces the mathematically correct value when n=0. You would get the same value if the numbers were signed. The issue isn’t the overflow, it’s that 0 should never have been input in the first place (as the resulting 8 bytes are insufficient space for the rtvec_def object).

dannystoll
Автор

Regarding integer overflow, the sensible behavior would be for the compiler to apply optimizations when it can prove that the overflow won't occur. When it can't prove that overflow won't occur, and it is doing that optimization, that's a security exploit waiting to happen.

dizekat
Автор

47:53 This should be a compile error. Like, "Error: you can't use 32 bit values for 64 bit pointer indexing". Compiler has all the information for that. No silent promotions to 64 bit is needed. One can use (u)int_fast32_t types for platform specific size.

iamvfx
Автор

I understand that not all UB can be defined, but a lot of them could at least be implementation defined (or require diagnostic output, and not be silent UB).
For example just saying that casting a byte/character pointer type to a pointer of float type is undefined behavior could mean that the compiler can just stop generating instructions for the code after it encountered UB, (since it's UB anyway doesn't matter what the rest of the code would do). But we know that this should be fine on all platforms if the pointer if properly aligned, and it's fine on some platform even if it is not aligned. This could/should be implementation defined even if it just says the behavior is architecture dependent, but the compiler (or toolchain) can guaranty that it will at least output machine code/instructions and not just give up on you.
I'm fine if the compiler says to me "Hey, we don't know what this will do, but at least we tried. Maybe you should look at this once more if this is really what you want." instead of the compiler doing 'Look at this stupid human! I recognize that this is UB, so I'm just gonna stop trying to even generate code, and I'm not gonna tell anyone, not gonna notify the user/programmer'.

szirsp
Автор

the simplest way to detect cyclic graph is to keep a single counter for node traversal and check it against total graph size

ephimp
Автор

The biggest issue with undefined behavior to me is that it's poorly named. Undefined behavior doesn't sound scary. It sounds like "I don't know if it's cloudy tomorrow" like nobody's scared of it being cloudy or not tomorrow, they'll still wake up and go to work despite not knowing before hand which it is. They dress up respectively. And then you also see these "this is actually undefined behavior" like in some Sean Parent talks I remember, and it ends up being something that doesn't do anything interesting, really nothing to worry about but the standard has not defined accurately what it should do. And some code depending on undefined behavior. It just doesn't sound too scary because it's such an enigma, nobody just has said what it should be doing so technically it could do anything imaginable and unimaginable (but many times it also won't do anything bad, possibly even desired).

So what I'm understanding is that the committee can't fix bad programming and illegal use of the language?

Yupppi
Автор

What if you have "char*" that you increment. Does it do same nasty wrap around handling or is it as fast as signed int?

grisevg
Автор

defining behavior for all platforms doesn't mean that the behavior has to be the same for all platforms, how on earth did that implication came about?

UGPepe
Автор

47:08 - Today it's fairly easy to get enough bits, without having to use the sign bit.
47:26 - The downside of using size_t is if ... is in a data structure it uses more space.

So, we it is easy to get enough bits, except that it's not.

I think that using size_t is the ideal solution. It gives you efficient code *and* it ensures that the code will work if you ever need to sort 10 GB of data. If your structures don't need to support more than 4 GB of data then you can store uint32_t in your structures and load into a size_t local variable.

OneWheelGuy