Implementing Newton-Raphson algorithm with Numba

Implementing Newton-Raphson algorithm with Numba#

SciPy has a function called optimize.newton that is an implementation of the Newton-Raphson root finding algorithm. If we pass both fprime and fprime2 arguments (first and second derivatives) it uses Halley’s method. Here we implement a more efficient but less general version of this code using Numba and compare its performance with SciPy. We find the root of \(f(\theta) = \theta - \sin (\theta) - 2 A / r^2\).

import functools
from typing import Tuple

import numpy as np
from numba import njit

ngjit = functools.partial(njit, cache=True, nogil=True)


@ngjit("b1(f8, f8, f8, f8)")
def within_tol(x: float, y: float, atol: float, rtol: float) -> bool:
    """Check if two float numbers are within a tolerance."""
    return bool(np.abs(x - y) <= atol + rtol * np.abs(y))


@ngjit("f8(f8, UniTuple(f8, 2))")
def func_main(theta: float, args: Tuple[float, float]) -> float:
    """The main function to find the root of."""
    area, rad = args
    return theta - np.sin(theta) - 2 * area / rad**2


@ngjit("f8(f8)")
def func_der1(theta: float) -> float:
    """The first derivative of the main function."""
    return 1.0 - np.cos(theta)


@ngjit("f8(f8)")
def func_der2(theta: float) -> float:
    """The second derivative of the main function."""
    return np.sin(theta)


@ngjit("f8(f8, UniTuple(f8, 2))")
def halley_solver(x0: float, args: Tuple[float, float]) -> float:
    """Find root of ``func_main`` using Halley's method."""
    p0 = np.float64(x0)
    tol = 1.48e-8
    for _ in range(50):
        fval = func_main(p0, args)
        if within_tol(fval, 0.0, tol, 0.0):
            return p0

        fder = func_der1(p0)
        if within_tol(fder, 0.0, tol, 0.0):
            return float(np.inf)
        newton_step = fval / fder

        # Since second derivative is available we can modify
        # the Newton-Raphson step based on Halley's method.
        fder2 = func_der2(p0)
        adj = 0.5 * newton_step * fder2 / fder

        if np.abs(adj) < 1:
            newton_step /= 1.0 - adj

        p0 -= newton_step
    return np.inf

The root of this function for \(A=0.91\) and \(r=2\) is approximately 1.45. We intentionally give a far off initial guess for a better comparison.

%timeit halley_solver(10000, (0.91, 2))

1.44 µs ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Now, let’s use optimize.newton:

from scipy import optimize


def der1(theta: float, _) -> float:
    """The first derivative of the main function."""
    return 1.0 - np.cos(theta)


def der2(theta: float, _) -> float:
    """The second derivative of the main function."""
    return np.sin(theta)
%timeit optimize.newton(func_main, 10000, fprime=der1, fprime2=der2, args=((0.91, 2),))

1.12 ms ± 43.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The Numba version has almost x1000 performance improvement. I should note that the reason that I didn’t pass the functions as an argument to halley_solver function is that you cannot specify the Numba signatures explicitly. Technically you can choose to not pass the signatures to ngjit, but for getting the maximum performance I decided to pass them.