**Problem: Cost function of optimization needs a speed up** Recently I was interested in optimizing the location of transport hubs within a large urban area and this led me to using R to perform some numerical optimization. After some rooting around it became clear to me that my cost function was more expensive than I would have liked approximately 3.5 seconds per execution so I looked into ways of reducing this. My cost function was as follows:

#min.distance function for measuring efficiency of layout for optimisation min.distance <- function(data, par) { #calculate the distance from each point to each transport hub M <- matrix( nrow=nrow(data) , ncol=length(par)/2 ) for ( i in seq(1, length(par), by=2) ) { col <- (i+1)/2 M[,col] <- (data$px-par[i])^2 + (data$py-par[i+1])^2 } #calculate the min distance to any transport hub minD <- vector(mode = "numeric" , length=nrow(data) ) for ( i in seq(1, nrow(M)) ) { minD[i] <- min(M[i,]) } min.distance <- sum ( minD ) } |

I found an extremely useful post called speed up the loop operation in r on stackover flow this led me down the path of using the Rcpp to develop ultra fast code.

- Using Rcpp I reduced the execution time of my cost function from ~3.5 seconds to 0.5 seconds for my particular dataset – that’s a
**x7 speed up!**

**Solution: How I used Rcpp inline library with optim to get x40 speed up!** To use Rcpp I needed to install R-Tools for Windows so that I was able to compile c++ code on the command line. This is very straight forward:

- Simply goto the Rtools section on r-project.org and download the Rtools version for the version of R you want to use. I downloaded Rtools 3.2.
- Install Rtools using all the default setting – I installed it in the same folder as my R installation ( “C:/R/” )
- Open R and run the following code to install the packages and execute a quick test function:

# install packages install.packages(c("Rcpp", "rbenchmark", "inline", "Runit")) # load main two packages library(Rcpp) library(inline) src <- "int n = as(ns); double x = as(xs); for (int i=0; i<n; i++) x=1/(1+x); return wrap(x); " test <- cxxfunction(signature(ns="integer",xs="numeric"),body=src, plugin="Rcpp") test(1,1) |

The test function simply executes the following calculation 1/(1+x) n times. If your test function executes without any problems then you are ready to execute the C++ version of min.distance. My C++ min.distance function is syntactically similar to the R code:

#min.distance.c function library(inline) src <-" Rcpp::NumericMatrix datacpp(data); //nx2 matrix Rcpp::NumericVector parcpp(par); //vector int nrowdata = datacpp.nrow(); int szpar = parcpp.size(); int hszpar = szpar / 2; // Calculate square distance from each point to each transport hub std::vector< std::vector > M ( nrowdata , std::vector ( hszpar ) ); for (int i=0; i<nrowdata; ++i) { for (int j=0; j<szpar-1; j+=2) { M[i][j/2] = pow( (datacpp(i,0) - parcpp[j]), 2.0) + pow((datacpp(i,1) - parcpp[j+1]) , 2.0); } } // Calculate the min distance to any transport hub double sssd=0; double smallest=0; for (int i=0; i<nrowdata; ++i) { smallest = M[i][0]; for (int j=0; j<hszpar; ++j) { if ( M[i][j] < smallest ) smallest = M[i][j]; } sssd += smallest; } return wrap(sssd); " min.distance.c <- cxxfunction(signature(data="numeric", par="numeric"), body=src, plugin="Rcpp") |

To find out more about Rcpp I recommend the following workshop slides as I found them a useful resource for understanding how-to use Rcpp + inline. If you are new to C++ you will notice the first time that you create the function the code will need to be compiled into executable code – once this has happened you code will execute lighting fast. Note: other than min.distance.c expecting a matrix rather than a data.frame it performs the same operation as min.distance and thus may be substituted as the cost function in optim. **Performance Testing** To test the performance difference I used the Rbenchmark library and ran 10 iterations for each version for comparision.

Length (Par) |
CPP |
R |

2 | 0.3 | 0.86 |

4 | 0.23 | 6.75 |

6 | 0.36 | 9.9 |

8 | 1.17 | 20.94 |

10 | 1.39 | 53.26 |

12 | 2.53 | 64.27 |

As can be seen from the data above optim with the min.distance.c as cost function took 1.39 seconds to execute whilst optim with min.distance as cost function took 53.26 seconds – that’s a **38.3 speed improvementÂ **something that would have take 24 hours to run will now take 40 minutes!

Here’s my benchmarking code so you can use it for your own needs:

#many stations case buildings=data.frame( px=rnorm(500, 0 , 1), py=rnorm(500, 0 , 1) ) mbuildings <- as.matrix(buildings) hubs=runif(10, 0 , 1) #regular r function min.distance <- function(data, par) { #calculate the distance from each point to each transport hub M <- matrix( nrow=nrow(data) , ncol=length(par)/2 ) for ( i in seq(1, length(par), by=2) ) { col <- (i+1)/2 M[,col] <- (data$px-par[i])^2 + (data$py-par[i+1])^2 } #calculate the min distance to any transport hub minD <- vector(mode = "numeric" , length=nrow(data) ) for ( i in seq(1, nrow(M)) ) { minD[i] <- min(M[i,]) } min.distance <- sum ( minD ) } #min.distance cpp function library(inline) src <-" Rcpp::NumericMatrix datacpp(data); //nx2 matrix Rcpp::NumericVector parcpp(par); //vector int nrowdata = datacpp.nrow(); int szpar = parcpp.size(); int hszpar = szpar / 2; // Calculate square distance from each point to each transport hub std::vector< std::vector > M ( nrowdata , std::vector ( hszpar ) ); for (int i=0; i<nrowdata; ++i) { for (int j=0; j<szpar-1; j+=2) { M[i][j/2] = pow( (datacpp(i,0) - parcpp[j]), 2.0) + pow((datacpp(i,1) - parcpp[j+1]) , 2.0); } } // Calculate the min distance to any transport hub double sssd=0; double smallest=0; for (int i=0; i<nrowdata; ++i) { smallest = M[i][0]; for (int j=0; j<hszpar; ++j) { if ( M[i][j] < smallest ) smallest = M[i][j]; } sssd += smallest; } return wrap(sssd); " min.distance.c <- cxxfunction(signature(data="numeric", par="numeric"), body=src, plugin="Rcpp") library ( rbenchmark ) benchmark ( optim(par = hubs, min.distance, data = buildings , method = "BFGS"), optim(par = hubs, min.distance.c, data = mbuildings , method = "BFGS"), columns=c("test", "replications", "elapsed", "relative"), order="relative", replications=10) |