Introduction
Computing the square root of a floating-point number is one of the most fundamental numerical operations. Modern CPUs have dedicated hardware instructions (like x86's fsqrt or ARM's vsqrt), but understanding the algorithms behind them — Newton's method (Babylonian method), the fast inverse square root trick, and bit-manipulation initial guesses — is essential for embedded systems, GPU shaders, and understanding numerical precision. The standard library sqrt() function uses hardware instructions when available and falls back to software implementations.
Newton's Method (Babylonian Method)
The most widely used algorithm. Given a number S, find x such that x^2 = S:
1def sqrt_newton(S, tolerance=1e-10):
2 if S < 0:
3 raise ValueError("Cannot compute square root of negative number")
4 if S == 0:
5 return 0.0
6
7 # Initial guess: S/2 (could be better, see below)
8 x = S / 2.0
9
10 while True:
11 # Newton's iteration: x_new = (x + S/x) / 2
12 x_new = (x + S / x) / 2.0
13 if abs(x_new - x) < tolerance:
14 return x_new
15 x = x_new
16
17print(sqrt_newton(2)) # 1.4142135623730951
18print(sqrt_newton(144)) # 12.0
19print(sqrt_newton(0.01)) # 0.1
Each iteration roughly doubles the number of correct digits (quadratic convergence).
Better Initial Guess Using IEEE 754
A smarter initial guess reduces the number of iterations from ~20 to ~4:
1import struct
2
3def sqrt_fast(S):
4 if S <= 0:
5 return 0.0
6
7 # Extract bits of the float
8 bits = struct.unpack('I', struct.pack('f', S))[0]
9
10 # Halve the exponent (right-shift by 1) for an approximate sqrt
11 # This gives a rough estimate within a factor of 2
12 bits = (bits >> 1) + 0x1FC00000 # Magic constant for float32
13
14 x = struct.unpack('f', struct.pack('I', bits))[0]
15
16 # Two Newton iterations for full float32 precision
17 x = (x + S / x) / 2.0
18 x = (x + S / x) / 2.0
19
20 return x
21
22print(sqrt_fast(2.0)) # 1.4142135381698608
23print(sqrt_fast(100.0)) # 10.0
Fast Inverse Square Root
The famous Quake III algorithm computes 1/sqrt(x) using bit-level tricks:
1// C implementation — the legendary fast inverse square root
2float fast_inv_sqrt(float number) {
3 long i;
4 float x2, y;
5 const float threehalfs = 1.5f;
6
7 x2 = number * 0.5f;
8 y = number;
9 i = *(long *)&y; // Interpret float bits as integer
10 i = 0x5f3759df - (i >> 1); // Magic constant
11 y = *(float *)&i; // Back to float
12 y = y * (threehalfs - (x2 * y * y)); // One Newton iteration
13 // y = y * (threehalfs - (x2 * y * y)); // Optional second iteration
14
15 return y;
16}
17
18// sqrt(x) = x * (1/sqrt(x))
19float fast_sqrt(float x) {
20 return x * fast_inv_sqrt(x);
21}
Python equivalent:
1import struct
2
3def fast_inv_sqrt(number):
4 x2 = number * 0.5
5 # Interpret float bits as int
6 i = struct.unpack('i', struct.pack('f', number))[0]
7 i = 0x5f3759df - (i >> 1) # Magic constant
8 y = struct.unpack('f', struct.pack('i', i))[0]
9 # Newton iteration
10 y = y * (1.5 - x2 * y * y)
11 return y
12
13print(1.0 / fast_inv_sqrt(2.0)) # ~1.4142
Binary Search Method
Simple and works for integer or fixed-point square roots:
1def sqrt_binary_search(S, tolerance=1e-10):
2 if S < 0:
3 raise ValueError("Negative input")
4 if S < 1:
5 lo, hi = S, 1.0
6 else:
7 lo, hi = 1.0, S
8
9 while hi - lo > tolerance:
10 mid = (lo + hi) / 2.0
11 if mid * mid > S:
12 hi = mid
13 else:
14 lo = mid
15
16 return (lo + hi) / 2.0
17
18print(sqrt_binary_search(2)) # 1.4142135623724243
19print(sqrt_binary_search(0.5)) # 0.7071067811865234
Convergence is linear (one bit of precision per iteration), much slower than Newton's quadratic convergence.
Integer Square Root
For exact integer square roots (floor of sqrt):
1def isqrt(n):
2 """Integer square root using Newton's method."""
3 if n < 0:
4 raise ValueError("Square root not defined for negative numbers")
5 if n == 0:
6 return 0
7 x = n
8 y = (x + 1) // 2
9 while y < x:
10 x = y
11 y = (x + n // x) // 2
12 return x
13
14print(isqrt(144)) # 12
15print(isqrt(150)) # 12 (floor)
16print(isqrt(10**18)) # 1000000000
17
18# Python 3.8+ has math.isqrt
19import math
20print(math.isqrt(144)) # 12
Using Standard Library
1import math
2
3# Standard sqrt (uses hardware instruction)
4print(math.sqrt(2)) # 1.4142135623730951
5
6# For complex numbers
7import cmath
8print(cmath.sqrt(-1)) # 1j
9
10# NumPy (vectorized)
11import numpy as np
12arr = np.array([1, 4, 9, 16, 25])
13print(np.sqrt(arr)) # [1. 2. 3. 4. 5.]
Common Pitfalls
Negative input: sqrt(-1) raises ValueError in Python's math.sqrt. Use cmath.sqrt for complex results. NumPy's np.sqrt(-1) returns nan with a warning.
Floating-point precision: sqrt(2)**2 does not equal exactly 2.0 due to IEEE 754 rounding. Use math.isclose(result**2, S) for comparisons, not ==.
Poor initial guess: Newton's method converges from any positive guess, but a bad guess (like x0 = 1e15 for sqrt(4)) requires many more iterations. Use the exponent-halving trick for a good starting point.
Integer overflow in bit tricks: The fast inverse square root trick assumes 32-bit float and 32-bit int. Using it with 64-bit doubles requires a different magic constant (0x5fe6eb50c7b537a9).
Division by zero in Newton's iteration: If the initial guess is 0, S / x causes division by zero. Always check for S == 0 before entering the iteration loop.
Summary
Newton's method (x = (x + S/x) / 2) converges quadratically — the standard algorithm for sqrt
Use IEEE 754 bit manipulation for a good initial guess, reducing iterations to 3-4 for full precision
The fast inverse square root (0x5f3759df trick) is historical — modern CPUs have dedicated sqrt instructions
math.sqrt() uses hardware instructions and is always the best choice for production code
For integer square roots, use math.isqrt() (Python 3.8+) for exact floor results