## How safe is sapply?

### Introduction

Consider a function whose return type is not known give a set of known argument types. One would desire that at least a given set of argument types would always give a consistent return type.

In R the `sapply`

function applys a given function to each element of a list or vector data type and if the data can be coerced returns a vector, matrix or array and if the items cannot be coerced returns a list.

Lets say I would like to know what the class of each column in a table is. Here is my table:

```
# Create the data frame
x <- data.frame(matrix(runif(50), nc = 5))
# Copy the data frame to y
x; y <- x
X1 X2 X3 X4 X5
1 0.38978750 0.5917161 0.8253375 0.9212409 0.81378414
2 0.19698452 0.9485623 0.5185512 0.4549364 0.08553984
3 0.04561459 0.7605560 0.6194432 0.2125995 0.45285563
4 0.12896352 0.9170430 0.1848815 0.6485884 0.85803866
5 0.86279157 0.8874200 0.9246772 0.5906202 0.93091009
6 0.56188260 0.5375034 0.7805531 0.7583699 0.73940692
7 0.93033282 0.8891776 0.1700843 0.0617091 0.96898827
8 0.12256712 0.3718221 0.5720081 0.9107144 0.63610281
9 0.54072288 0.5103223 0.7364012 0.9315216 0.51058452
10 0.77529338 0.1871945 0.1663368 0.2746300 0.26960322
# Change one of the columns in y to POSIXct time class
y[,5] <- as.POSIXct(seq(1, 10), origin = "2014-01-01")
y
X1 X2 X3 X4 X5
1 0.38978750 0.5917161 0.8253375 0.9212409 2013-12-31 16:00:01
2 0.19698452 0.9485623 0.5185512 0.4549364 2013-12-31 16:00:02
3 0.04561459 0.7605560 0.6194432 0.2125995 2013-12-31 16:00:03
4 0.12896352 0.9170430 0.1848815 0.6485884 2013-12-31 16:00:04
5 0.86279157 0.8874200 0.9246772 0.5906202 2013-12-31 16:00:05
6 0.56188260 0.5375034 0.7805531 0.7583699 2013-12-31 16:00:06
7 0.93033282 0.8891776 0.1700843 0.0617091 2013-12-31 16:00:07
8 0.12256712 0.3718221 0.5720081 0.9107144 2013-12-31 16:00:08
9 0.54072288 0.5103223 0.7364012 0.9315216 2013-12-31 16:00:09
10 0.77529338 0.1871945 0.1663368 0.2746300 2013-12-31 16:00:10
```

Now consider the following:
```
sapply(x, class)
X1 X2 X3 X4 X5
"numeric" "numeric" "numeric" "numeric" "numeric"
sapply(y, class)
$X1
[1] "numeric"
$X2
[1] "numeric"
$X3
[1] "numeric"
$X4
[1] "numeric"
$X5
[1] "POSIXct" "POSIXt"
```

In the above evalutation the types of both items entered in are data frame but the return item in one case is a vector and another case is a list. This kind of unexpected behaviour can cause all kinds of bother. One of the contributary elements of this problem is that classes in R are characters which are themselves vectors, so the `POSIXct`

class in this case is defined by a character of length two. We should comment that this problem does not exist in Julia since there is a definite type class so that any defined class is identified by an entity that is itself a class (or type), and so arrays of classes are such that there can be only one element for each class.

A safer thing to do is to use the `lapply`

function:

```
lapply(x, class)
$X1
[1] "numeric"
$X2
[1] "numeric"
$X3
[1] "numeric"
$X4
[1] "numeric"
$X5
[1] "numeric"
lapply(y, class)
$X1
[1] "numeric"
$X2
[1] "numeric"
$X3
[1] "numeric"
$X4
[1] "numeric"
$X5
[1] "POSIXct" "POSIXt"
```

To return vector, matrix, or array types, a safer option is the `vapply`

function which checks that all the objects returned are of common type and are of the same dimension as a submitted model. An example use of `vapply`

:

```
vapply(rep(4, 4), runif, FUN.VALUE = rep(1, 4))
[,1] [,2] [,3] [,4]
[1,] 0.7801901 0.6784725 0.08042061 0.7888491
[2,] 0.9243664 0.8080259 0.77640415 0.5587008
[3,] 0.9915787 0.1311069 0.98725875 0.2628835
[4,] 0.3421723 0.1678438 0.35767812 0.1722526
```

In summary when you are iterating over a vector or list and returning a generic object use `lapply`

, and if you are returning a basic type where each item returned has the same dimension use `vapply`

.

### Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

**Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.**