Clever Code: Fast Inverse Square Root

preview_player
Показать описание
In this video we look at calculating the fast inverse square root of a number as featured in Quake III Arena!

Рекомендации по теме
Комментарии
Автор

Doing a Memcpy doesn't have much perfornance loss nowadays the compilers are pretty advanced and -O1 already is a huge leap in performance. But you got to remember back in the days this was implemented it was very probable that a memcopy was actually pretty slow compared to direct casting. We are talking about 4-5 CPU cicles for each cast. That is multiplied by the factor that this code ran every frame.
We now are not in the 90's and there are CPU instructions specifically made to do inverse square roots.

Anyways great video.

gerardmarquinarubio
Автор

What we have to keep in mind is that this is basically 1998 we're speaking here (Quake III Arena was first officially released in December 1999, but development started more than a year prior to this). Internet at that time was still new thing, and there was certainly no Wikipedia back then! Everything had to be figured out basically on their own. That algorithm was fairly new back then, I guess it has been an implementation that was very unique to id Software back then, and I guess there was no other algorithm known at the time, or at least not other algorithm that would be free of undefined behaviour code. They worked with whatever was available back then.

CZghost
Автор

Yes, type punning has been deprecated, but it was actually OK when we worked on the Quake code.

TerjeMathisen
Автор

C++20 also introduces std::bit_cast(), basically a wrapper around memcpy().

tillslave
Автор

I think this video has missed the reason why this is Undefined Behaviour.
It's really because C / C++ does NOT specify what format the float / double / long double datatypes are stored in, and different compilers/architectures may (and some indeed do) store floats in different formats (or different sizes). Hence bit manipulation of floats will always result in 'undefined behaviour' in C / C++ since these languages are intended on being architecture agnostic.
Memcpy is not 'the solution' here, probably some inline assembly would have actually made more sense, since it would have made it obvious that it was architecture specific, and would have resulted in failed assembling on a 'non-supported' architecture.

bevanweiss
Автор

Wow, this algorithm of inverse square root of x is also very powerful to calculate square root of x itsself!
y(n+1)=y(n)*(3/2-(0.5*x*y(n)*y(n))), no divisions in, only multiplications.
Classical newton interation for square root of x say sqrt(x) has in a division in each iteration x/(y(n)), look:
y(n+1)=y(n)-(y(n)-x/y(n))/2 which makes cpu process computing in cycles very slow to achieve result of square root of x.
At loop end of iterations of this inverse square root of x you multiply result with x,
because 1/sqrt(x) * x equals as well = sqrt(x).

andiback
Автор

If anyone is wondering, you can approximate normal sqrt for double precision too...
Just add and bits of double shifted to the right by one place. Average error is about 1.5% and max about 4% according to my testing.
Works with Newton's method too.

panjak
Автор

Nice video! Even better channel! Thanks YouTube recommendation algorithm for letting find this. Subscribed!

ericsnakey
Автор

Might i suggest a High-Pass filter for your microphone while recording? Say, set at at 80 Hertz. Watching in my car, and my sub is really highlighting your desk vibrations being captured by the mic.

briansepolen
Автор

Your non-undefined behaviour function is still actually undefined behavior because "The absolute size of built-in floating-point types isn't specified in the standard." So by copying the "sizeof(float)" you may actually be copying something smaller or larger than that "int" or even a "long" type. It is not the cast operation that creates the actual undefined behavior it is the fact that the data types being used may not even be the same size on every platform. "float" isn't guaranteed to be 4-bytes and neither is "int" or "long" types. You would have to use some kind of template specializations to do this without any undefined behavior at all but most people would probably just create a preprocessor macro to switch out the function for platforms it is incompatible with.
Edit: But even using templates would be undefined behavior because there is no guarantee that the float storage type is in IEEE-754 format... we can't win.

rsn
Автор

Haha that "what the fuck" comment in the code on Wiki

multimevil
Автор

If it works, what does it matter if assembly code ends up the same. Undefined or not.

rollmeister
Автор

What is the accuracy differences? Is the inbuilt intel intrinsic perfectly accurate compared with sqrt(x)?

If you insist on normalized input (say between 0+ and 1), and you only want 1% accuracy, then why not implement a look up table. That might be even faster.

mb-faze
Автор

what does intrinsic mean in this context?

yoshi
Автор

Which part of this video is about the actual fast inverse square root function?

arteme
Автор

That is not "undefined behaviour". Who came up with thst crap? O_o

emmepombar
Автор

I found something interesting about square root numbers, when you divide by it's square and multiply by 100 to get it's percentage the percentage is always the same value you would get if you divided 100 by the square root, if someone figures out the means of calculating that percentage from the square number then there will be a consistent way to calculate the square root with no hacks or guess work

zxuiji