home


In Search of a Fast Cube Root


Following are thoughts and observations regarding the machine extraction of cube roots with a particular emphasis on speed of calculations.

Cube roots are a topic of interest, because they're used in many common formulae and a fast cube root function isn't included with Microsoft Visual Studio.

In the absence of a special cube root function, a typical strategy is calculation via a power function (e.g., pow(x, 1.0/3.0)). This can be problematic in terms of speed and in terms of accuracy when negative numbers aren't handled properly.

All of the implementations of cube root functions I discovered rely on Newton's method.  Notable implementations include William. M. Kahan's cube root function as found in the BSD sources (kahan's cbrt) and Ken Turkowski's as presented in an Apple technical report (CubeRoot). Kahan's implementation is particularly robust, handling negative numbers, NaN, INF and subnormals. Turkowski's paper includes interesting facts and insights.  



Cube Root by Newton's method

Given an approximation ‘a’ of the cube root of ‘R’, the following formula will produce ‘b’, a closer approximation to the cube root of ‘R’.



or

 

Repeated application yields results progressively closer to the true cube root of R, with the number of bits of precision doubling with each iteration. Consequently, the accuracy of the initial guess fed into the iteration is especially important. Desired degrees of precision can be achieved with fewer iterations when initial estimates are more accurate. Computationally, each iteration is relatively inexpensive: one divide, two multiplies and two adds.

Turkowski's approach factors the cube root of R into the product of a power of two and a cube root of a small value with a limited range. He then uses a quadratic approximation to estimate the cube root of the small value. This approach yields an initial value with approximately 6 bits of precision to seed the Newton-Raphson iteration.

Kahan uses a bit hack to generate an initial estimate with 5 bits of precision. The approach exploits the properties of IEEE 754 floating point numbers by leveraging the fact that their binary representation is close to a log2 reppresentation. The following code demonstrates the technique applied for the double precision case. (Note: Kahan's code also includes support for handling subnormal numbers. The following code assumes 32-bit integers).


inline double cbrt_5d(double d)
{
   const unsigned int B1 = 715094163;
   double t = 0.0;
   unsigned int* pt = (unsigned int*) &t;
   unsigned int* px = (unsigned int*) &d;
   pt[1]=px[1]/3+B1;
   return t;



General Nth Root Approximations

I've generalized this approach to provide initial approximations for nth roots generally for IEEE 754 floats and doubles in the following:

template <int n>
inline float nth_rootf(float x)
{
   const int ebits = 8;
   const int fbits = 23;

   int& i = (int&) x;
   const int bias = (1 << (ebits-1))-1;
   i = (i - (bias << fbits)) / n + (bias << fbits);

   return x;
}

template <int n>
inline double nth_rootd(double x)
{
   const int ebits = 11;
   const int fbits = 52;

   _int64& i = (_int64&) x;
   const _int64 bias = (1 << (ebits-1))-1;
   i = (i - (bias << fbits)) / n + (bias << fbits);

   return x;
}


To use the code to generate an nth root approximation, n is specified as a C++ template parameter. For example, a float approximation for the cube root of 5.0 can be can be generated with the call
nth_rootf<3>(5.0f). In my experience, these functions provide estimates in the range of 5 to 6 bits of precision.



Halley's Method applied to Cube Roots

Recent investigations into faster converging cube root approximations led to a method offered by Otis E. Lancaster in Machine Method for the Extraction of a Cube Root
1. Lancaster’s method is calculated relatively efficiently, and it exhibits cubic convergence in contrast to the quadratic convergence of Newton's method. It takes the following form (where ‘b’ is a closer approximation to the cube root of R than ‘a’).




The method is actually an application of a method devised by comet discoverer Edmond Halley (see Halley's Method).  Over the years it has been repeatedly rediscovered and attributed to others including Lambert and Bailey. With slight massaging, the equation can be reduced to the following:
 



In this form, the calculation cost amounts to one divide, three multiplies and three additions. Although this exceeds Newton-Raphson by a multiply and an add, speed improvements can be reaped thanks to the faster rate at which Halley’s method converges and the relatively high cost of divide, which is a factor in each iteration of either method.


Comparison of Various Methods

 

Following are times and accuracy estimates of various methods including the bit hack approximation (cbrt_5f, cbrt_5d), up to three iterations of Halley’s method and up to four iterations of Newton’s methods. These results were obtained on an Intel Core Duo system using C++ compiled with Microsoft Visual Studio 2005. 

 

In the tests, one million cube roots are calculated uniformly across the range from 0.0 to 1.0. Times are reported in milliseconds. The second column indicates an estimate of the minimum number of bits of precision in the worst case found. The last column indicates an estimate of the average number of bits of precision per evaluation. 

 

The cbrt_5f and cbrt_5d functions were used to seed the iterative methods. It’s important to note that a better approximation will converge even more quickly. Given the strategy used, a single iteration of Halley’s method appears sufficient for 16 bits of precision. Also, note the costs of various methods in comparison to the standard pow( ) function.

 

 

32-bit float tests

----------------------------------------

cbrt_5f      8.8 ms    5 mbp   6.223 abp

pow        144.5 ms   23 mbp  23.000 abp

halley x 1  31.8 ms   15 mbp  18.961 abp

halley x 2  59.0 ms   23 mbp  23.000 abp

newton x 1  23.4 ms   10 mbp  12.525 abp

newton x 2  48.9 ms   20 mbp  22.764 abp

newton x 3  72.0 ms   23 mbp  23.000 abp

newton x 4  89.6 ms   23 mbp  23.000 abp

 

 

64-bit double tests

----------------------------------------

cbrt_5d     23.5 ms    5 mbp   6.299 abp

pow        151.3 ms   52 mbp  52.000 abp

halley x 1  56.5 ms   16 mbp  19.503 abp

halley x 2  95.3 ms   47 mbp  51.575 abp

halley x 3 122.1 ms   51 mbp  51.965 abp

newton x 1  48.8 ms   10 mbp  12.596 abp

newton x 2  78.6 ms   20 mbp  25.205 abp

newton x 3 104.5 ms   40 mbp  47.681 abp

newton x 4 134.7 ms   51 mbp  51.992 abp

 

 

The source code is available here.



1. Otis E. Lancaster, Machine Method for the Extraction of Cube Root Journal of the American Statistical Association, Vol. 37, No. 217. (Mar., 1942), pp. 112-115.