how floating point works

Показать описание

a description of the IEEE single-precision floating point standard

Рекомендации по теме

Комментарии

Fun fact about the imprecision of floating points: if you’re in a infinitely generating video game, going out further and further will eventually lead to your player “teleporting” around instead of walking, since the subtlety of your individual steps cannot be represented when your player coords are so big, which is why some infinitely generated games move the world around the player instead of the player around the world, that way everything it processed close to the coordinates 0, 0, 0 for the greatest precision

aliasd

For anyone curious: The generalization of a decimal point is known as a radix point.

codahighland

ive often found the "pretend that youre inventing it for the first time" method of teaching to be really effective, and i feel like this video is just such an excellent case study. math is not my strong suit but i still found it easy to follow because of that framing and the wonderful visualizations. thank you!

nickbb

4:43 The moment I realized this incredible fluid and clean visualization was actually RECORDING EXCEL when he typed in the function blew my mind. I can’t imagine how these videos are made.

nerdporkspassmst

The reason 0 represents positive and 1 represents negative has to do with the fact that signed/unsigned standards with overlapping range will be the same if you do this for integers and the standard was carried over to floating point.
So for 8 bit integers the range from 0->127 can be represented in both signed and unsigned standards. Because it would be convenient to represent them the same way, someone made the decision to do that. The result is that positive numbers in signed integers have a leading 0 and negatives have a leading 1.

doommustard

Floating-point is a very carefully-thought-out format. Here's my favorite float tricks:

* You can generate a uniform random float between 1.0 and 2.0 by putting random bits into the mantissa, and using a certain constant bit pattern for the exponent/sign bits. Then you can subtract 1 from it to get a uniform random value between 0 and 1. This is extremely fast *and* has a much more uniform distribution compared to the naive ways of generating random floats, like /

* GPU's can store texture RGBA pixels in a variety of formats, and some of them are floating-point. But engineers try as hard as possible to cram lots of data into a very small space, which leads to some esoteric formats. For example, the "R10G11B11" format stores each of the 3 components as an *unsigned* float, with 10 bits for the Red float, and 11 for each of the Green and Blue floats, to fit into a 32-bit pixel. Even weirder is "RGB9_E5". In this format, each of the three color channels uses a 14-bit unsigned float, 5 bits of exponent and 9 of mantissa. How does this fit into 32 bits? Because they share the same exponent! The pixel has 5 bits for the exponent, and then 9 bits for each mantissa.

bammam

Fun fact: The Chrome javascript engine uses the NaN-space to hold other datatypes such as integers, booleans and pointers. That way everything is a double.

kalleguld

Another reason for 0 being positive and 1 being negative: (-1)^0 is positive, and (-1)^1 is negative. More technically, it's because the multiplication of signs acts as addition mod 2, with positive or 0 as the identity, respectively. (So when multiplying floating-point numbers, the sign bit is just the XOR of the sign bits of the multiplicands.)

WhirligigStudios

I've always thought I'd love to have jan Misali as a teacher but after watching them explain this competely foreign and complex topic to me so well, they'd be so over qualified for any teacher's pay or compensation. Like, the w series was in depth, but this just has so many moving parts and you communicated them so well!

aggressivelymidtier

"And that's a really good question."
…
"The other problem with—"
😂
This wasn't just informative and really well-explained, but funny to boot!

emilyrln

IEEE 754 is amazing in how often it just works. "MATLAB's creator Dr. Cleve Moler used to advise foreign visitors not to miss the country's two most awesome spectacles: the Grand Canyon, and meetings of IEEE p754" -William Kahan

jezerozeroseven

This takes "Hey Siri, what's 0÷0?" to a whole new level

CommonCommiestudios

Wow the information density here is high! This 18 minute video is essentially the first lecture of the semester in the Numerical Analysis course I took in my senior year as a math major, except that lecture was an hour long! The professor used floating point as a motivation to talk about different kind of errors in lecture 2 (i.e., round-off vs. truncation) which honestly was a pretty effective framing.

bidaubadeadieu

Shout out to my favorite underused format the fixed-point. An old game creation library I used had them. Great for 2D games when you wanted subpixel movement but not a lot of range ("I want it to move 16 pixels over 30 frames"). I found the lack of precision perfect so that you don't get funny rounding errors and have things not line up after moving them small spaces.

Hyreia

this is a good video!! I like the way you explain how this came to exist. it is a human thing made by humans, and as such it is messy and flawed but it _works_ . and I love that. it was created for a purpose, and it serves that purpose well.

you didn't do this, but I've seen people say that "computers can't store arbitrary-precision numbers", which frustrates me, because computers _can_ do that, they just need a different format. these tools are freedoms, not restrictions. if you want to perform a different task, then find different tools. and yeah you probably won't ever need this much precision, but like things like finance exists, where making a $0.01 error in a billion-dollar field is cause for concern, at the least. arbitrary-precision numbers do require arbitrary memory, but they're definitely possible.

I think 0 is positive and 1 is negative because 0 is "default" and one is "special", like how main() returns 0 on success in C. also it could be because of how signed integers are stored, where is -3 b/c of two's compliment (I think?), which is mathematically justified by 2-adic numbers.

anyway good video, as always.

toricon

I already know how floating point works but I’m watching this anyway because I have PRINCIPLES and they include watching every Jan misali video

agcummings

11:23 "The caveat only being that they get increasingly less precise the closer you get to zero."

That isn't the only caveat. Subnormals often have a separate trap handler in the processor which can slow down processing quite a bit if a lot of subnormals appear.

seneca

If I had to guess, 0 being positive and 1 being negative is a holdover from 2's complement binary representation.

For those uninitiated, 2's complement binary representation (2C) is a way to represent positive and negative whole numbers in binary that also uses the leading bit as the signed bit. To showcase why this format exists here's an example of writing a -3 in 2C using 8 bits.

Step 1: Write |-3| in binary
|-3| = 3 = (0000 0011)

Step 2: Invert all of the bits
Inv(0000 0011) = 1111 1100

Step 3: Add 1
1111 1100
+ 0000 0001
1111 1101

-3 = (1111 1101)2C

Converting it back is the reverse process
Step 1: Subtract 1
1111 1111
- 0000 0001
1111 1110

Step 2: Invert all of the bits
Inv(1111 1110) = 0000 0001

Step 3: Convert to base of choice, 10 in this example, and multiply by -1
0000 0001 = 1
1 * -1 = -1

(1111 1111)2C = -1

The advantage of this form is that addition works the same regardless of sign of both numbers or the order. It does this by using the overflow to discard the negative sign if the result would be positive.

Example: -3 + 5 Example: -3 + 2

1111 1101 1111 1101
+ 0000 0101 + 0000 0010
0000 0010 1111 1111

2 = (0000 0010)2C -1 = (1111 1111)2C

It's ingenious how effortlessly this integrates with existing computer operations and doesn't have glaring issues, such as One's Complement having a duplicate zero or requiring operational baggage like more naïve negative systems.

To go back to the original statement, this system only works if the leading digit is a one because if it were inverted 0 would be (1000 0000)2C. This is not only unsatisfying to look at, but dangerous when you consider most bits in a computer are initialized as zero, which would be read in this hypothetical system as -255.

Dawn_Cavalier

That barely audible mouse click to stop recording right after a nonsensical parting one-liner is just *chef's kiss*

leow

Microprocessor/IC enthusiast here.
5:33
The reason that a 1 in the sign bit of a signed integer is negative and a 0 is positive, is that it saves a step when doing 2's Compliment, which is how computers do subtraction (basically, you can turn any addition problem into a subtraction problem by flipping the bits of one of the numbers and adding 1, since computers can't natively subtract).

denverbeek

how floating point works

how floating point works

Floating Point Numbers - Computerphile

How do Floating Point Numbers Work?

How Floating Point Numbers Work (in 7 minutes!)

Why Is This Happening?! Floating Point Approximation

Wait, so comparisons in floating point only just KINDA work? What DOES work?

What is Floating-Point Performance?

Floating Point Numbers (Part1: Fp vs Fixed) - Computerphile

Binary 4 – Floating Point Binary Fractions 1

Representations of Floating Point Numbers

Floating Point Numbers

IEEE 754 Standard for Floating Point Binary Arithmetic

Floating Point Numbers | Fixed Point Number vs Floating Point Numbers

An Introduction To Floating Point Math - Martin Hořeňovský - NDC TechTown 2022

Lecture 19. Floating-Point Unit (FPU)

Floating Point Numbers

Why Computers Screw up Floating Point Math

How the FPU works #1 Explaining Floating Point

Why computers can't add up. Floating Point Arithmetic.

Decimal to IEEE 754 Floating Point Representation

Math for Game Developers - Floating Point Numbers

CppCon 2015: John Farrier “Demystifying Floating Point'

Floating Point Numbers (Part2: Fp Addition) - Computerphile

Floating Point Number Representation IEEE-754 ~ C Programming