Clever Code: Fast Inverse Square Root

Показать описание

In this video we look at calculating the fast inverse square root of a number as featured in Quake III Arena!

Рекомендации по теме

Комментарии

Doing a Memcpy doesn't have much perfornance loss nowadays the compilers are pretty advanced and -O1 already is a huge leap in performance. But you got to remember back in the days this was implemented it was very probable that a memcopy was actually pretty slow compared to direct casting. We are talking about 4-5 CPU cicles for each cast. That is multiplied by the factor that this code ran every frame.
We now are not in the 90's and there are CPU instructions specifically made to do inverse square roots.

Anyways great video.

gerardmarquinarubio

What we have to keep in mind is that this is basically 1998 we're speaking here (Quake III Arena was first officially released in December 1999, but development started more than a year prior to this). Internet at that time was still new thing, and there was certainly no Wikipedia back then! Everything had to be figured out basically on their own. That algorithm was fairly new back then, I guess it has been an implementation that was very unique to id Software back then, and I guess there was no other algorithm known at the time, or at least not other algorithm that would be free of undefined behaviour code. They worked with whatever was available back then.

CZghost

Yes, type punning has been deprecated, but it was actually OK when we worked on the Quake code.

TerjeMathisen

C++20 also introduces std::bit_cast(), basically a wrapper around memcpy().

tillslave

I think this video has missed the reason why this is Undefined Behaviour.
It's really because C / C++ does NOT specify what format the float / double / long double datatypes are stored in, and different compilers/architectures may (and some indeed do) store floats in different formats (or different sizes). Hence bit manipulation of floats will always result in 'undefined behaviour' in C / C++ since these languages are intended on being architecture agnostic.
Memcpy is not 'the solution' here, probably some inline assembly would have actually made more sense, since it would have made it obvious that it was architecture specific, and would have resulted in failed assembling on a 'non-supported' architecture.

bevanweiss

Wow, this algorithm of inverse square root of x is also very powerful to calculate square root of x itsself!
y(n+1)=y(n)*(3/2-(0.5*x*y(n)*y(n))), no divisions in, only multiplications.
Classical newton interation for square root of x say sqrt(x) has in a division in each iteration x/(y(n)), look:
y(n+1)=y(n)-(y(n)-x/y(n))/2 which makes cpu process computing in cycles very slow to achieve result of square root of x.
At loop end of iterations of this inverse square root of x you multiply result with x,
because 1/sqrt(x) * x equals as well = sqrt(x).

andiback

If anyone is wondering, you can approximate normal sqrt for double precision too...
Just add and bits of double shifted to the right by one place. Average error is about 1.5% and max about 4% according to my testing.
Works with Newton's method too.

panjak

Nice video! Even better channel! Thanks YouTube recommendation algorithm for letting find this. Subscribed!

ericsnakey

Might i suggest a High-Pass filter for your microphone while recording? Say, set at at 80 Hertz. Watching in my car, and my sub is really highlighting your desk vibrations being captured by the mic.

briansepolen

Your non-undefined behaviour function is still actually undefined behavior because "The absolute size of built-in floating-point types isn't specified in the standard." So by copying the "sizeof(float)" you may actually be copying something smaller or larger than that "int" or even a "long" type. It is not the cast operation that creates the actual undefined behavior it is the fact that the data types being used may not even be the same size on every platform. "float" isn't guaranteed to be 4-bytes and neither is "int" or "long" types. You would have to use some kind of template specializations to do this without any undefined behavior at all but most people would probably just create a preprocessor macro to switch out the function for platforms it is incompatible with.
Edit: But even using templates would be undefined behavior because there is no guarantee that the float storage type is in IEEE-754 format... we can't win.

rsn

Haha that "what the fuck" comment in the code on Wiki

multimevil

If it works, what does it matter if assembly code ends up the same. Undefined or not.

rollmeister

What is the accuracy differences? Is the inbuilt intel intrinsic perfectly accurate compared with sqrt(x)?

If you insist on normalized input (say between 0+ and 1), and you only want 1% accuracy, then why not implement a look up table. That might be even faster.

mb-faze

what does intrinsic mean in this context?

yoshi

Which part of this video is about the actual fast inverse square root function?

arteme

That is not "undefined behaviour". Who came up with thst crap? O_o

emmepombar

I found something interesting about square root numbers, when you divide by it's square and multiply by 100 to get it's percentage the percentage is always the same value you would get if you divided 100 by the square root, if someone figures out the means of calculating that percentage from the square number then there will be a consistent way to calculate the square root with no hacks or guess work

zxuiji

Clever Code: Fast Inverse Square Root

Clever Code: Fast Inverse Square Root

Fast Inverse Square Root — A Quake III Algorithm

The Fast Inverse Square Root -- 0x5f3759df explained!!

Fast Inverse Square Root // deutsch

Someone improved my code by 40,832,277,770%

Implementing the Inverse Square Root

My 3D demo-18: artifacts of using Quake III Q_rsqrt() in raytracer

This Algorithm is 1,606,240% FASTER

Fast Inverse Square Root - Friday Minis 246

Do you know what is Fast inverse square root

IQ TEST

Fast Inverse Square Root Enigmatic Code

Fast Inverse Square Root revisited // deutsch

Human Calculator Solves World’s Longest Math Problem #shorts

Floating Point Bit Hacks Every Programmer Should Know (Including Fast Inverse Square Root - Quake)

HOW CHINESE STUDENTS SO FAST IN SOLVING MATH OVER AMERICAN STUDENTS

Don't write clever code.

February 1st: Exploring the QUAKE III Fast Inverse Square Root by Daniel Harrington

Building an Inverse Square Root House

Trying this trend at 37 weeks pregnant #shorts

Fast Inverse Square Root | Quake III Arena algorithm explanation by CodeMeng

Hacker Hour: Cute Bithax and Fast Inverse Square Root

MegaFavNumbers | The magic number and the legendary fast inverse square root hack.

Late Night Coding Retro: Fast Inverse Square Root