ChainLadder in < 2 microseconds with Julia's ccall()?

Active Analytics Ltd: posted 11 August 2014 by Chibisi Chima-Okereke


For some time now we have been playing 'limbo' with execution times on the Chain Ladder algorithm. Last time we put it through Julia and with some tinkering got it down to about 25 microseconds. This time we are claiming less than 2 microseconds by using Julia's ccall() function that allows programmers to call C and Fortran libraries.

The code in this blog entry is run/compiled on Ubuntu 14.04, core i7 processor having 16 GB RAM, using julia version 0.2.1 and gcc version 4.8.2 compiler. The scripts are available on Github.

The algorithm

As we mentioned previously, Julia's C++ api is still in its infancy, so we decided to go with pure C with this one. To pass a 2D array of floats to ccall() you need to convert it to an array of pointers to 1D floats in Julia before sending to C. We did not like the idea of having to construct an array of pointers in Julia - because it would mean doing more tasks which may slow down the algorithm. After a bit of head-scrathing we decided to unfold the claims triangle into a 1D array, send it using a single pointer to be processed in C and then reshape that array back into a 2D array holding the complete claims triangle.

The C Code

The C code functions for calculating the "age-to-age" factors and chain ladder are given below:

/* File: speed-cl.c */
#include <stdio.h>

/* Function to calculate the age-to-age factors*/
void get_factor(double *vec, double *fact, int p)
	int j, m;
	double tmp[2];
	for(j = 0; j < p-1; ++j)
		tmp[0] = 0.; tmp[1] = 0.;
		for(m = 0; m < (p - (j + 1)); ++m)
			tmp[0] += vec[(m + j*p)];
			tmp[1] += vec[(m + (1 + j)*p)];
		fact[j] = tmp[1]/tmp[0];

/* Chain Ladder function */
void cl(double *vec, double *fact, int p)
	int i, j, k;
	for(i = p - 1; i > 0; --i)
		for(j = 0; j < p - i; ++j)
			k = p*i + (p - i) + j*p;
			vec[k] = vec[k - p]*fact[i+j-1];

The Julia Code

# Chain Ladder function
function chain_ladder(tri)
	p = size(tri)[1]
	tri = reshape(tri, (p*p, 1))
	fact = zeros(p)
	ccall((:get_factor, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
	ccall((:cl, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
	return reshape(tri, (p, p))

function bench(n::Int)
	output = [@elapsed chain_ladder(x) for i = 1:n]
	output = output*1E6 # microseconds
	return DataFrames.DataFrame(min = quantile(output, 0), lq = quantile(output, 0.25), median = quantile(output, 0.5), 
			uq = quantile(output, 0.75), max = quantile(output, 1), neval = n)

The the benchmarks


1x6 DataFrame:
          min    lq median      uq   max neval
[1,]    1.586 1.604  1.614 1.68475 8.172  1000

That's a median time of less than 2 microseconds.


Sometimes imposing restrictions on the way you write code can lead to some unforseen benefits, in this case we became more creative with the way we carried out the chain ladder calculation which led to a very nice uplift in performance. What restrictions can you impose on the way you work to gain efficiency benefits from increased performance? Be careful not to deprive yourself of tools that you need to maintain your effectiveness.

Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

Dr. Chibisi Chima-Okereke, Training, Statistics and Data Science Consulting.