 # Improving the speed of your Optimization in R using Rcpp

Problem: Cost function of optimization needs a speed up Recently I was interested in optimizing the location of transport hubs within a large urban area and this led me to using R to perform some numerical optimization. After some rooting around it became clear to me that my cost function was more expensive than I would have liked approximately 3.5 seconds per execution so I looked into ways of reducing this. My cost function was as follows:

 ```#min.distance function for measuring efficiency of layout for optimisation min.distance <- function(data, par) { #calculate the distance from each point to each transport hub M <- matrix( nrow=nrow(data) , ncol=length(par)/2 ) for ( i in seq(1, length(par), by=2) ) { col <- (i+1)/2 M[,col] <- (data\$px-par[i])^2 + (data\$py-par[i+1])^2 } #calculate the min distance to any transport hub minD <- vector(mode = "numeric" , length=nrow(data) ) for ( i in seq(1, nrow(M)) ) { minD[i] <- min(M[i,]) } min.distance <- sum ( minD ) } ```

I found an extremely useful post called speed up the loop operation in r on stackover flow this led me down the path of using the Rcpp to develop ultra fast code.

• Using Rcpp I reduced the execution time of my cost function from ~3.5 seconds to 0.5 seconds for my particular dataset – that’s a x7 speed up!

Solution: How I used Rcpp inline library with optim to get x40 speed up! To use Rcpp I needed to install R-Tools for Windows so that I was able to compile c++ code on the command line. This is very straight forward:

1. Simply goto the Rtools section on r-project.org and download the Rtools version for the version of R you want to use. I downloaded Rtools 3.2.
2. Install Rtools using all the default setting – I installed it in the same folder as my R installation ( “C:/R/” )
3. Open R and run the following code to install the packages and execute a quick test function:

 ```# install packages install.packages(c("Rcpp", "rbenchmark", "inline", "Runit")) # load main two packages library(Rcpp) library(inline) src <- "int n = as(ns); double x = as(xs); for (int i=0; i

The test function simply executes the following calculation 1/(1+x) n times. If your test function executes without any problems then you are ready to execute the C++ version of min.distance. My C++ min.distance function is syntactically similar to the R code:

 ```#min.distance.c function library(inline) src <-" Rcpp::NumericMatrix datacpp(data); //nx2 matrix Rcpp::NumericVector parcpp(par); //vector int nrowdata = datacpp.nrow(); int szpar = parcpp.size(); int hszpar = szpar / 2; // Calculate square distance from each point to each transport hub std::vector< std::vector > M ( nrowdata , std::vector ( hszpar ) ); for (int i=0; i

To find out more about Rcpp I recommend the following workshop slides as I found them a useful resource for understanding how-to use Rcpp + inline. If you are new to C++ you will notice the first time that you create the function the code will need to be compiled into executable code – once this has happened you code will execute lighting fast. Note: other than min.distance.c expecting a matrix rather than a data.frame it performs the same operation as min.distance and thus may be substituted as the cost function in optim. Performance Testing To test the performance difference I used the Rbenchmark library and ran 10 iterations for each version for comparision.

 Length (Par) CPP R 2 0.3 0.86 4 0.23 6.75 6 0.36 9.9 8 1.17 20.94 10 1.39 53.26 12 2.53 64.27

As can be seen from the data above optim with the min.distance.c as cost function took 1.39 seconds to execute whilst optim with min.distance as cost function took 53.26 seconds – that’s a 38.3 speed improvement  something that would have take 24 hours to run will now take 40 minutes! Comparison of performance of optim using Rcpp vs R cost functions.

Here’s my benchmarking code so you can use it for your own needs:

 ```#many stations case buildings=data.frame( px=rnorm(500, 0 , 1), py=rnorm(500, 0 , 1) ) mbuildings <- as.matrix(buildings) hubs=runif(10, 0 , 1) #regular r function min.distance <- function(data, par) { #calculate the distance from each point to each transport hub M <- matrix( nrow=nrow(data) , ncol=length(par)/2 ) for ( i in seq(1, length(par), by=2) ) { col <- (i+1)/2 M[,col] <- (data\$px-par[i])^2 + (data\$py-par[i+1])^2 } #calculate the min distance to any transport hub minD <- vector(mode = "numeric" , length=nrow(data) ) for ( i in seq(1, nrow(M)) ) { minD[i] <- min(M[i,]) } min.distance <- sum ( minD ) } #min.distance cpp function library(inline) src <-" Rcpp::NumericMatrix datacpp(data); //nx2 matrix Rcpp::NumericVector parcpp(par); //vector int nrowdata = datacpp.nrow(); int szpar = parcpp.size(); int hszpar = szpar / 2; // Calculate square distance from each point to each transport hub std::vector< std::vector > M ( nrowdata , std::vector ( hszpar ) ); for (int i=0; i
Advertisements

This site uses Akismet to reduce spam. Learn how your comment data is processed.