RNG benchmarks in Python vs R

Author: Dr Chibisi Chima-Okereke Created: July 23, 2014 00:00:00 GMT Published: July 23, 2014 00:00:00 GMT

Today we provide a helpful script inspired by R’s microbenchmark package for benchmarking in Python. We then use this script to compare random number generation (RNG) in Python vs R. You may want to view a previous blog on R vs Python.

At the moment if you want to benchmark functions in Python, there are limited tools that allow you to do this apart from basic clock time. The timeit module is available but it doesn’t have the most convenient usage syntax. So after using the microbenchmark package, we were inspired to outline a similar simple script in Python.

The code in this blog entry is run/compiled on Ubuntu 14.04, core i7 processor having 16 GB RAM, using the g++ version 4.8.2 compiler, R version 3.1.0, Python version 2.7.6. The script is now located in github.

Preparations

We load a few python modules that will be used in the script:

import collections as col
import numpy as np
import time as tt
import tabulate as tab

Next we create a SimpleTable object, which is literally a simple table type object that inherits from an OrderedDict in the collections library. It is rudimentary and designed to hold the data from the benchmarks but does not have any checking or advanced features. The keys of the dict object represent the column headers and column data is held in the values:

class SimpleTable(col.OrderedDict):
  """
    SimpleTable is just a OrderedDict with some convenience functions
    It is not designed to be full featured nor does it do any checking
    of the table structure.
    The keys represent the column headers and the values represent the
    columns.
  """
  def nrow(self):
    """
    Returns the number of rows in the table
    """
    return self[list(self.viewkeys())[0]].__len__()
  def ncol(self):
    """
    Returns the number of columns in the table
    """
    return self.__len__()
  def rbind(self, row):
    """
    Binds any suitable dict, OrderedDict, or SimpleTable
    To the current table
    :row: suitable dict or SimpleTable of rows to be bound
    """
    for k in self.viewkeys():
        for i in np.arange(row[k].__len__()):
            self[k].append(row[k][i])
  def __repr__(self):
    return tab.tabulate(self, headers = "keys")

Below we have a function that formats and rounds units of time from seconds to either "ms" miliseconds, "us" microsceonds, "ns" nanoseconds, "s" seconds (rounding), or simply return the "raw" timing output from tt.time function.

# Function to transform and round time units
def timeUnits(times, unit = "ms", ndigits = 5):
  """
  Function formats times for the bench function
  :param times: list of times to be formatted
  :param unit: string the units of time: "ms": (milliseconds), "us": (microseconds),
                    "ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
  :param ndigits: number of decimal places to round down the times
  :return: list of formatted times
  """
  if unit == "ms":
      return [round(i * 1E3, ndigits) for i in times]
  elif unit == "us":
      return [round(i * 1E6, ndigits) for i in times]
  elif unit == "ns":
      return [round(i * 1E9, ndigits) for i in times]
  elif unit == "s":
      return [round(i, ndigits) for i in times]
  elif unit == "raw":
      return times

Benchmarking functions

Next we provide three functions for benchmarking, bench is the workhorse function that carries out the benchmark of a single case defined in a string, the lbenchmark function carries out benchmarks of a list of of strings to be evaluated, and benchmark allows multiple strings to be entered one after the other as different arguments in the function. So for common use, the lbenchmark and benchmark functions are the available - take your pick.

The bench function is given below:

def bench(sExpr, neval=100, units = "ms", ndigits = 5):
  """
  :param expr: string expression to be evaluated
  :param neval: number of evaluations that the statistics will be calculated from
  :param units: string the units of time: "ms": (milliseconds), "us": (microseconds),
                  "ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
  :param ndigits: number of decimal places to round down the times
  :return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
  """
  times = np.ndarray(shape=(neval,), dtype = np.float64)
  expr = compile(sExpr, "<string>", "eval")
  for i in np.arange(neval):
      start = tt.time()
      out = eval(expr)
      end = tt.time()
      times[i] = end - start
  times = np.percentile(times, [0, 25, 50, 75, 100])
  times = timeUnits(times, units, ndigits)
  summ = SimpleTable([('expr', [sExpr]), ('min', [times[0]]), ('lq', [times[1]]), ('median', [times[2]]), ('uq', [times[3]]), ('max', [times[4]])], ('neval', [neval]))
    return summ

The lbenchmark function is given below:

def lbenchmark(lExpr, **kwargs):
  """
  List version of benchmark, takes in a list of string expressions as lExpr
  :param lExpr: list of strings to be evaluated
  :param neval: number of evaluations that the statistics will be calculated from
  :param units: string the units of time: "ms": (milliseconds), "us": (microseconds),
                  "ns": nanoseconds, "s": time in seconds, "raw": raw time in seconds
  :param ndigits: number of decimal places to round down the times
  :return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
  """
  nExpr = lExpr.__len__()
  out = bench(lExpr[0], **kwargs)
  if nExpr > 1:
      for i in np.arange(1, nExpr):
          out.rbind(bench(lExpr[i], **kwargs))
  return out

The benchmark function is given below:

def benchmark(*args, **kwargs):
  """
  Function running bench underneath for multiple strings
  :param args: the strings to be evaluated
  :param kwargs: arguments passed to bench
  :return: SimpleTable of times min, lower, mid, upper quartiles and max time, and the expression run
  """
  nExpr = args.__len__()
  out = bench(args[0], **kwargs)
  if nExpr > 1:
      for i in np.arange(1, nExpr):
          out.rbind(bench(args[i], **kwargs))
  return out

RNG R vs. Python

Here we give examples that benchmark a few functions for sampling from statistical distributions and compare them to their R counterparts. Both R and Python use the Mersenne Twister as the RNG engine:

Examples

In Python

# Units are microseconds ...
>>> evalStrings = ["np.random.uniform(size=1000)", "np.random.normal(size=1000)",
               "np.random.binomial(size=1000, n=10, p = .5)", "np.random.poisson(size=1000, lam=1)"]
>>> bench(evalStrings[0], units = "us")
expr                              min       lq    median      uq      max    neval
----------------------------  -------  -------  --------  ------  -------  -------
np.random.uniform(size=1000)  23.8419  24.0803    25.034  25.034  46.9685      100
>>> print benchmark("np.random.uniform(size=1000)", "np.random.normal(size=1000)",
               "np.random.binomial(size=1000, n=10, p = .5)", "np.random.poisson(size=1000, lam=1)",
               units="us")
expr                                             min       lq    median        uq       max    neval
-------------------------------------------  -------  -------  --------  --------  --------  -------
np.random.uniform(size=1000)                 15.974   16.9277   16.9277   17.345    36.9549      100
np.random.normal(size=1000)                  56.982   57.9357   57.9357   58.8894  148.058       100
np.random.binomial(size=1000, n=10, p = .5)  57.9357  58.8894   64.1346  147.104   190.02        100
np.random.poisson(size=1000, lam=1)          61.9888  63.1809   66.0419   71.0487  105.143       100
>>> lbenchmark(evalStrings, units = "us")
expr                                             min       lq    median       uq       max    neval
-------------------------------------------  -------  -------  --------  -------  --------  -------
np.random.uniform(size=1000)                 24.7955  25.9876   25.9876  28.1334   56.0284      100
np.random.normal(size=1000)                  76.0555  78.9166   85.1154  86.0691  168.085       100
np.random.binomial(size=1000, n=10, p = .5)  59.1278  64.8499   66.5188  96.0827  120.163       100
np.random.poisson(size=1000, lam=1)          64.8499  69.6778   70.0951  71.0487   97.99        100

In R

> require(microbenchmark)
Loading required package: microbenchmark
> n = 1000> microbenchmark(runif(n), rnorm(n), rbinom(n, p = .5, size = 10), rpois(n, 1))
Unit: microseconds
                          expr    min      lq median      uq     max neval
                      runif(n) 37.317 39.6890 40.914 42.2355  70.507   100
                      rnorm(n) 82.480 85.4910 86.836 88.2970 154.761   100
 rbinom(n, p = 0.5, size = 10) 92.181 94.5645 95.781 97.2435 153.588   100
                   rpois(n, 1) 39.534 41.3360 42.541 43.7010  85.183   100

Conclusion

We have created a few easy to use functions here for quickly carrying out benchmarks. These tools allow programmers to have an idea of what the performance of different functions will be and we provided some examples of usage. The scripts brings some R-like sugar to Python.

The timed RNG tests reveals a little better RNG performance from Python overall (particularly for the uniform distribution at more than x2 faster). We will probably come back to this topic at some later blog.