ChainLadder in < 2 microseconds with Julia's ccall()?

Author: Dr Chibisi Chima-Okereke Created: August 11, 2014 00:00:00 GMT Published: August 11, 2014 00:00:00 GMT

For some time now we have been playing ‘limbo’ with execution times on the Chain Ladder algorithm. Last time we put it through Julia and with some tinkering got it down to about 25 microseconds. This time we are claiming less than 2 microseconds by using Julia’s ccall() function that allows programmers to call C and Fortran libraries.

The code in this blog entry is run/compiled on Ubuntu 14.04, core i7 processor having 16 GB RAM, using julia version 0.2.1 and gcc version 4.8.2 compiler. The scripts are available on Github.

The algorithm

As we mentioned previously, Julia’s C++ api is still in its infancy, so we decided to go with pure C with this one. To pass a 2D array of floats to ccall() you need to convert it to an array of pointers to 1D floats in Julia before sending to C. We did not like the idea of having to construct an array of pointers in Julia - because it would mean doing more tasks which may slow down the algorithm. After a bit of head-scrathing we decided to unfold the claims triangle into a 1D array, send it using a single pointer to be processed in C and then reshape that array back into a 2D array holding the complete claims triangle.

The C Code

The C code functions for calculating the “age-to-age” factors and chain ladder are given below:

/* File: speed-cl.c */
#include <stdio.h>;
/* Function to calculate the age-to-age factors*/
void get_factor(double *vec, double *fact, int p)
{
int j, m;
double tmp[2];
for(j = 0; j &lt; p-1; ++j)
{
tmp[0] = 0.; tmp[1] = 0.;
for(m = 0; m &lt; (p - (j + 1)); ++m)
{
tmp[0] += vec[(m + j*p)];
tmp[1] += vec[(m + (1 + j)*p)];
}
fact[j] = tmp[1]/tmp[0];
}
}
void cl(double *vec, double *fact, int p)
{
int i, j, k;
for(i = p - 1; i &gt; 0; --i)
{
for(j = 0; j &lt; p - i; ++j)
{
k = p*i + (p - i) + j*p;
vec[k] = vec[k - p]*fact[i+j-1];
}
}
}


The Julia Code

# Chain Ladder function
p = size(tri)[1]
tri = reshape(tri, (p*p, 1))
fact = zeros(p)
ccall((:get_factor, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
ccall((:cl, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
return reshape(tri, (p, p))
end
function bench(n::Int)
output = [@elapsed chain_ladder(x) for i = 1:n]
output = output*1E6 # microseconds
return DataFrames.DataFrame(min = quantile(output, 0), lq = quantile(output, 0.25), median = quantile(output, 0.5),
uq = quantile(output, 0.75), max = quantile(output, 1), neval = n)
end


Benchmarks

bench(1000)
1x6 DataFrame:
min    lq median      uq   max neval
[1,]    1.586 1.604  1.614 1.68475 8.172  1000


That’s a median time of less than 2 microseconds.

Conclusion

Sometimes imposing restrictions on the way you write code can lead to some unforseen benefits, in this case we became more creative with the way we carried out the chain ladder calculation which led to a very nice uplift in performance. What restrictions can you impose on the way you work to gain efficiency benefits from increased performance? Be careful not to deprive yourself of tools that you need to maintain your effectiveness.