A computation which occurs often in applications such as graphics is

normalizing a vector. This requires both the calculation of a square

root and a floating-point division—both of which are expensive

operations. The following documents the implementation of an

algorithm which computes a relatively fast inverse square root using

simpler operations. This is documented more clearly on the

Wikipedia page Fast inverse square root.

The Basic Algorithm

The source code for the basic algorithm is

float inv_sqrt( float x ) {

int xi = *reinterpret_cast<int *>( &x );

xi = INV_SQRT_N – (xi >> 1);

return *reinterpret_cast<float *>( &xi );

}

where INV_SQRT_N is a magic number is chosen to minimize the

error. Depending on which error one would like to minimize, be it

the 1-, 2-, or ∞-norm, different magic numbers need to be

used as is shown in Table 1.

Note: new-comers to C may wonder about the rather odd operation

of taking the address of a float, casting it as a pointer to an integer,

and then de-referencing this pointer:

int *reinterpret_cast<int *>(&x).

We cannot use static_cast<int>(x) because this will

reinterpret the floating-point number by truncating the value to

the greatest integer less than or equal to the value. The above

formulation explicitly states: treat the bits of the floating-point

number as if they were the bits of an integer.

Table 1. Optimal magic numbers for the three norms.

Magic Number INV_SQRT_N |
1-Norm of Relative Error |
2-Norm of Relative Error |
∞-Norm of Relative Error |

1597203179 | 0.01594 | 0.02224 | 0.05055 |

1597294787 | 0.01715 | 0.02093 | 0.04482 |

1597465647 | 0.02339 | 0.02528 | 0.03421 |

Figures 1 through 3 show the relative error of the approximations

of 1/√x for values of x between 0.5 and 8. You

will notice that the relative error repeats for each power of 4. Figures 1, 2, and

3 plot the error when using the constant which minimizes the

1-, 2-, and ∞-norms, respectively.

Figure 1. The relative error for the coefficient minimizing the 1-norm of the relative error.

Figure 2. The relative error for the coefficient minimizing the 2-norm of the relative error.

Figure 3. The relative error for the coefficient minimizing the ∞-norm of the relative error.

Applying Newton’s Method

Applying one iteration of Newton’s method reduces the error significantly.

float inv_sqrt_newton( float x ) {

float const half_x0 = 0.5f*x;

int xi = *reinterpret_cast<int *>( &x );

xi = INV_SQRT_NEWTON_N – (xi >> 1);

x = *reinterpret_cast<float *>( &xi );

return x*(1.5f – half_x0*x*x);

}

where INV_SQRT_NEWTON_N is a magic number is chosen to minimize the

error. Depending on which error one would like to minimize, be it

the 1-, 2-, or ∞-norm, different magic numbers need to be

used as is shown in Table 2.

Table 2. Optimal magic numbers for the three norms with one step of Newton’s method.

Magic Number INV_SQRT_NEWTON_N |
1-Norm of Relative Error |
2-Norm of Relative Error |
∞-Norm of Relative Error |

1597292357 | 0.0006520 | 0.001078 | 0.002988 |

1597376322 | 0.0007246 | 0.0009483 | 0.002338 |

1597463175 | 0.0009549 | 0.001118 | 0.001751 |

The last number is one smaller that the number recommended by

David Eberly in his 2002 paper.

Figures 4 through 6 show the relative error of the approximations

of 1/√x for values of x between 0.5 and 8 for this modification. As

with the standard function, the relative error repeats for each power of 4. Figures 4, 5, and

6 plot the error when using the constant which minimizes the

1-, 2-, and ∞-norms, respectively.

Figure 4. The relative error for the coefficient minimizing the 1-norm of the relative error with Newton’s method.

Figure 5. The relative error for the coefficient minimizing the 2-norm of the relative error with Newton’s method.

Figure 6. The relative error for the coefficient minimizing the ∞-norm of the relative error with Newton’s method.

At the cost of five floating-point operations (four multiplications and one subtraction),

the error is reduced by a factor of 20.

A Proposed Further Optimization

Looking at Figures 4-6, one may note that the relative

error is nonnegative after one application of Newton’s method. This is

a natural consequence of Newton’s method and is not because the image is

drawing the absolute value of the error. Do demonstrate this visually,

consider Figure 7 which shows 1/x2 − 2, the root

of which is 1/√2.

Figure 7. The function 1/x2 − 2 with

root 1/√2 &asympt; 0.7071.

An underestimate which is sufficiently close to the actual value

will continue to underestimate the root, as is shown in Figure 8 which

starts with the approximation 0.6 and, after one iteration of Newton’s

method, is 0.684.

Figure 8. The approximation 0.6 to the value 1/√2 and one

iteration of Newton’s method.

An overestimate which is sufficiently close to the actual value,however,

will underestimate the root with the next approximation, as is shown in Figure 9 which

starts with the approximation 0.8 and, after one iteration of Newton’s

method, is 0.688···.

Figure 9. The approximation 0.8 to the value 1/√2 and one

iteration of Newton’s method.

Consequently, one further multiplication

can be used to reduce the error yet again by approximately 50% by centering

the result. A significant bonus is that the multiplier occurs on

the result; however, we can absorb the multiplication into two constants

which are already being used—in essence, we get the multiplications

for free.

float inv_sqrt_newton( float x ) {

float const mx = 0.5f*MULTIPLIER*x;

int xi = *reinterpret_cast<int *>( &x );

xi = INV_SQRT_NEWTON_N – (xi >> 1);

x = *reinterpret_cast<float *>( &xi );

return x*(1.5f*MULTIPLIER – mx*x*x);

}

Using the appropriate multipliers to reduce the

approximation, as is shown in Figures 10 through 12.

Figure 10. The relative error for the coefficient minimizing the 1-norm of the relative error with Newton’s method and a multiplier.

Figure 11. The relative error for the coefficient minimizing the 2-norm of the relative error with Newton’s method and a multiplier.

Figure 12. The relative error for the coefficient minimizing the ∞-norm of the relative error with Newton’s method and a multiplier.

Table 2. Optimal magic numbers for the three norms with one step of Newton’s method.

Norm | MULTIPLIER | Error |

1-Norm | 1.000363245811462 | 0.0005151 |

2-Norm | 1.000724768371582 | 0.0006122 |

∞-Norm | 1.000876311302185 | 0.0008765 |

What is possibly surprising is that this does not seem to increase the run time as there are already

a number of floating-point multiplications into which this additional operation could be wrapped

into.

Trial Runs

Approximating the integral of 1/sqrt(x) using a Riemann sum from 0 to 2^22, we get the

following run times:

- Using sqrt and floating-point division: 3.396s.
- Using inv_sqrt_multiplier: 1.147
- Using inv_sqrt_newton: 1.118
- Using inv_sqrt: 0.837

While papers claim that time is reduced to 25% of that using no calculations,

I only get a reduction to 33%; however, I will assume that is a result of my ignorance.

What is fascinating; however, is that the addition of the multiplier only degrades

the performance of Newton’s method by less than 3%.

The Magic Number of 1597463007/0x5f3759df

The recommended magic number 1597463007 does not appear to minimize any of the norms

listed above. The wikipedia page and the references therefrom do not shed further light

as to why this specific value was chosen.

It would be interesting to know why that value was chosen over

a value which minimizes the relative error subject to a specific norm.

Chris Lomont suggests another constant which is only one unit away

from the optimal constant minimizing the ∞-norm.

Function Plots

Figures 13 and 14 plot 1/√x versus inv_sqrt(x) and

inv_sqrt_multiplier(x), respectively. The first image shows clearly

that the approximation is not very good; however, the second is almost visually

indistinguishable.

Figure 13. A plot of 1/√x and inv_sqrt(x) on [0.25, 4].

Figure 14. A plot of 1/√x and inv_sqrt_multiplier(x) on [0.25, 4].

References

Thanks to Ryan Fox for suggesting this topic.