numba is a just-in-time (JIT) compiler for Python. With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.
Press Spacebar
to go to the next slide (or ?
to see all navigation shortcuts)
Lunch Time Python, Scientific Software Center, Heidelberg University
conda install numba
python -m pip install numba
Toy example: implement a vector reduction operation:
r(x,y) = $ \sum_i \cos(x_i) \sin(y_i) $
Some random vectors to benchmark our functions:
import numpy as np
x = np.random.uniform(low=-1, high=1, size=5000000)
y = np.random.uniform(low=-1, high=1, size=5000000)
import math
def r_python(x_vec, y_vec):
s = 0
for x, y in zip(x_vec, y_vec):
s += math.cos(x) * math.sin(y)
return s
r_python(x, y)
-1728.9213022976226
%timeit r_python(x,y)
1.83 s ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def r_numpy(x_vec, y_vec):
return np.dot(np.cos(x_vec), np.sin(y_vec))
r_numpy(x, y)
-1728.921302297671
%timeit r_numpy(x,y)
41.7 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# pip install cython
%load_ext cython
%%cython
import math
def r_cython(x_vec, y_vec):
s = 0
for x,y in zip(x_vec, y_vec):
s += math.cos(x) * math.sin(y)
return s
r_cython(x, y)
-1728.9213022976226
%timeit r_cython(x,y)
1.38 s ± 27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%cython
import math
# use C math functions
from libc.math cimport sin, cos
# use C types instead of Python types
def r_cython(double[:] x_vec, double[:] y_vec):
cdef double s = 0
cdef int i
for i in range(len(x_vec)):
s += cos(x_vec[i])*sin(y_vec[i])
return s
r_cython(x, y)
-1728.9213022976226
%timeit r_cython(x,y)
114 ms ± 72.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
if "google.colab" in str(get_ipython()):
!pip install fortran-magic -qqq
%load_ext fortranmagic
%%fortran
subroutine r_fortran(x_vec, y_vec, res)
real, intent(in) :: x_vec(:), y_vec(:)
real, intent(out) :: res
integer :: i, n
n = size(x_vec)
res = 0
do i=1,n
res = res + cos(x_vec(i))*sin(y_vec(i))
enddo
endsubroutine r_fortran
r_fortran(x, y)
-1728.9290771484375
%timeit r_fortran(x,y)
84.1 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
if "google.colab" in str(get_ipython()):
!pip install git+https://github.com/aldanor/ipybind.git -qqq
%load_ext ipybind
%%pybind11
#include <pybind11/numpy.h>
#include <math.h>
PYBIND11_PLUGIN(example) {
py::module m("example");
m.def("r_pybind", [](const py::array_t<double>& x, const py::array_t<double>& y) {
double sum{0};
auto rx{x.unchecked<1>()};
auto ry{y.unchecked<1>()};
for (py::ssize_t i = 0; i < rx.shape(0); i++){
sum += std::cos(rx[i])*std::sin(ry[i]);
}
return sum;
});
return m.ptr();
}
r_pybind(x, y)
-1728.9213022976226
%timeit r_pybind(x, y)
106 ms ± 56.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
from numba import jit
@jit
def r_numba(x_vec, y_vec):
s = 0
for x, y in zip(x_vec, y_vec):
s += math.cos(x) * math.sin(y)
return s
/tmp/ipykernel_2249/1652357933.py:5: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def r_numba(x_vec, y_vec):
r_numba(x, y)
-1728.9213022976226
# pure python with numba JIT
%timeit r_numba(x,y)
113 ms ± 468 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Two compilation modes
nopython
mode (default)object
mode (fallback)nopython
mode is not possiblenopython=True
or use @njit
You can optionally explicitly specify the function signature. Use cases:
from numba import float32
@jit(float32(float32, float32))
def sum(a, b):
return a + b
/tmp/ipykernel_2249/4188174656.py:4: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. @jit(float32(float32, float32))
sum(1, 0.99999999)
2.0
nopython=True
disable Object mode fallbacknogil=True
release the Python Global Interpreter Lock (GIL)cache=True
cache the compiled funtions on diskparallel=True
enable automatic parallelizationparallel=True
option to enableprange
to explicitly parallelize a loop over a range
from numba import jit, prange
@jit(parallel=True)
def r_numba(x_vec, y_vec):
s = 0
for i in prange(len(x_vec)):
s += math.cos(x[i]) * math.sin(y[i])
return s
/tmp/ipykernel_2249/1651332874.py:4: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. @jit(parallel=True)
r_numba(x, y)
-1728.921302297651
%timeit r_numba(x,y)
54.5 ms ± 316 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
ufunc
is a function that operates on scalars@numba.vectorize
and use it like built-in numpy ufuncsfrom numba import vectorize, float64
@vectorize([float64(float64, float64)], target="parallel")
def r(x, y):
return np.cos(x) * np.sin(y)
r(2, 3)
-0.05872664492762098
r(x, y)
array([-0.28745359, -0.253076 , -0.55404343, ..., 0.66926636, -0.01654682, 0.06856152])
np.sum(r(x, y))
-1728.9213022976726
%timeit np.sum(r(x,y))
92.1 ms ± 696 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
@generated_jit
decorator for compile-time logic, e.g. type specializations@stencil
decorator for creating a stencil to apply to an array@cfunc
decorator to generate a C-callback (e.g. to pass to scipy.integrate)