floating point
square root
numerical methods
algorithm
computational mathematics

Determining Floating Point Square Root

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Computing the square root of a floating-point number is one of the most fundamental numerical operations. Modern CPUs have dedicated hardware instructions (like x86's fsqrt or ARM's vsqrt), but understanding the algorithms behind them — Newton's method (Babylonian method), the fast inverse square root trick, and bit-manipulation initial guesses — is essential for embedded systems, GPU shaders, and understanding numerical precision. The standard library sqrt() function uses hardware instructions when available and falls back to software implementations.

Newton's Method (Babylonian Method)

The most widely used algorithm. Given a number S, find x such that x^2 = S:

python
1def sqrt_newton(S, tolerance=1e-10):
2    if S < 0:
3        raise ValueError("Cannot compute square root of negative number")
4    if S == 0:
5        return 0.0
6
7    # Initial guess: S/2 (could be better, see below)
8    x = S / 2.0
9
10    while True:
11        # Newton's iteration: x_new = (x + S/x) / 2
12        x_new = (x + S / x) / 2.0
13        if abs(x_new - x) < tolerance:
14            return x_new
15        x = x_new
16
17print(sqrt_newton(2))    # 1.4142135623730951
18print(sqrt_newton(144))  # 12.0
19print(sqrt_newton(0.01)) # 0.1

Each iteration roughly doubles the number of correct digits (quadratic convergence).

Better Initial Guess Using IEEE 754

A smarter initial guess reduces the number of iterations from ~20 to ~4:

python
1import struct
2
3def sqrt_fast(S):
4    if S <= 0:
5        return 0.0
6
7    # Extract bits of the float
8    bits = struct.unpack('I', struct.pack('f', S))[0]
9
10    # Halve the exponent (right-shift by 1) for an approximate sqrt
11    # This gives a rough estimate within a factor of 2
12    bits = (bits >> 1) + 0x1FC00000  # Magic constant for float32
13
14    x = struct.unpack('f', struct.pack('I', bits))[0]
15
16    # Two Newton iterations for full float32 precision
17    x = (x + S / x) / 2.0
18    x = (x + S / x) / 2.0
19
20    return x
21
22print(sqrt_fast(2.0))   # 1.4142135381698608
23print(sqrt_fast(100.0))  # 10.0

Fast Inverse Square Root

The famous Quake III algorithm computes 1/sqrt(x) using bit-level tricks:

c
1// C implementation — the legendary fast inverse square root
2float fast_inv_sqrt(float number) {
3    long i;
4    float x2, y;
5    const float threehalfs = 1.5f;
6
7    x2 = number * 0.5f;
8    y = number;
9    i = *(long *)&y;                // Interpret float bits as integer
10    i = 0x5f3759df - (i >> 1);     // Magic constant
11    y = *(float *)&i;              // Back to float
12    y = y * (threehalfs - (x2 * y * y));  // One Newton iteration
13    // y = y * (threehalfs - (x2 * y * y));  // Optional second iteration
14
15    return y;
16}
17
18// sqrt(x) = x * (1/sqrt(x))
19float fast_sqrt(float x) {
20    return x * fast_inv_sqrt(x);
21}

Python equivalent:

python
1import struct
2
3def fast_inv_sqrt(number):
4    x2 = number * 0.5
5    # Interpret float bits as int
6    i = struct.unpack('i', struct.pack('f', number))[0]
7    i = 0x5f3759df - (i >> 1)  # Magic constant
8    y = struct.unpack('f', struct.pack('i', i))[0]
9    # Newton iteration
10    y = y * (1.5 - x2 * y * y)
11    return y
12
13print(1.0 / fast_inv_sqrt(2.0))  # ~1.4142

Binary Search Method

Simple and works for integer or fixed-point square roots:

python
1def sqrt_binary_search(S, tolerance=1e-10):
2    if S < 0:
3        raise ValueError("Negative input")
4    if S < 1:
5        lo, hi = S, 1.0
6    else:
7        lo, hi = 1.0, S
8
9    while hi - lo > tolerance:
10        mid = (lo + hi) / 2.0
11        if mid * mid > S:
12            hi = mid
13        else:
14            lo = mid
15
16    return (lo + hi) / 2.0
17
18print(sqrt_binary_search(2))   # 1.4142135623724243
19print(sqrt_binary_search(0.5)) # 0.7071067811865234

Convergence is linear (one bit of precision per iteration), much slower than Newton's quadratic convergence.

Integer Square Root

For exact integer square roots (floor of sqrt):

python
1def isqrt(n):
2    """Integer square root using Newton's method."""
3    if n < 0:
4        raise ValueError("Square root not defined for negative numbers")
5    if n == 0:
6        return 0
7    x = n
8    y = (x + 1) // 2
9    while y < x:
10        x = y
11        y = (x + n // x) // 2
12    return x
13
14print(isqrt(144))  # 12
15print(isqrt(150))  # 12 (floor)
16print(isqrt(10**18))  # 1000000000
17
18# Python 3.8+ has math.isqrt
19import math
20print(math.isqrt(144))  # 12

Using Standard Library

python
1import math
2
3# Standard sqrt (uses hardware instruction)
4print(math.sqrt(2))        # 1.4142135623730951
5
6# For complex numbers
7import cmath
8print(cmath.sqrt(-1))      # 1j
9
10# NumPy (vectorized)
11import numpy as np
12arr = np.array([1, 4, 9, 16, 25])
13print(np.sqrt(arr))        # [1. 2. 3. 4. 5.]

Common Pitfalls

  • Negative input: sqrt(-1) raises ValueError in Python's math.sqrt. Use cmath.sqrt for complex results. NumPy's np.sqrt(-1) returns nan with a warning.
  • Floating-point precision: sqrt(2)**2 does not equal exactly 2.0 due to IEEE 754 rounding. Use math.isclose(result**2, S) for comparisons, not ==.
  • Poor initial guess: Newton's method converges from any positive guess, but a bad guess (like x0 = 1e15 for sqrt(4)) requires many more iterations. Use the exponent-halving trick for a good starting point.
  • Integer overflow in bit tricks: The fast inverse square root trick assumes 32-bit float and 32-bit int. Using it with 64-bit doubles requires a different magic constant (0x5fe6eb50c7b537a9).
  • Division by zero in Newton's iteration: If the initial guess is 0, S / x causes division by zero. Always check for S == 0 before entering the iteration loop.

Summary

  • Newton's method (x = (x + S/x) / 2) converges quadratically — the standard algorithm for sqrt
  • Use IEEE 754 bit manipulation for a good initial guess, reducing iterations to 3-4 for full precision
  • The fast inverse square root (0x5f3759df trick) is historical — modern CPUs have dedicated sqrt instructions
  • math.sqrt() uses hardware instructions and is always the best choice for production code
  • For integer square roots, use math.isqrt() (Python 3.8+) for exact floor results

Course illustration
Course illustration

All Rights Reserved.