Improving the speed of your Optimization in R using Rcpp

Problem: Cost function of optimization needs a speed up Recently I was interested in optimizing the location of transport hubs within a large urban area and this led me to using R to perform some numerical optimization. After some rooting around it became clear to me that my cost function was more expensive than I would have liked approximately 3.5 seconds per execution so I looked into ways of reducing this. My cost function was as follows:

#min.distance function for measuring efficiency of layout for optimisation
min.distance <- function(data, par) {
	
	#calculate the distance from each point to each transport hub
	M <- matrix( nrow=nrow(data) , ncol=length(par)/2 )
	for ( i in seq(1, length(par), by=2) )
	{
		col <- (i+1)/2
		M[,col] <- (data$px-par[i])^2  + (data$py-par[i+1])^2 
	}

	#calculate the min distance to any transport hub
	minD <- vector(mode = "numeric" , length=nrow(data) )
	for ( i in seq(1, nrow(M)) )
	{
		minD[i] <- min(M[i,])
	}
	
	min.distance <- sum ( minD )
}

I found an extremely useful post called speed up the loop operation in r on stackover flow this led me down the path of using the Rcpp to develop ultra fast code.

  • Using Rcpp I reduced the execution time of my cost function from ~3.5 seconds to 0.5 seconds for my particular dataset – that’s a x7 speed up!

Solution: How I used Rcpp inline library with optim to get x40 speed up! To use Rcpp I needed to install R-Tools for Windows so that I was able to compile c++ code on the command line. This is very straight forward:

  1. Simply goto the Rtools section on r-project.org and download the Rtools version for the version of R you want to use. I downloaded Rtools 3.2.
  2. Install Rtools using all the default setting – I installed it in the same folder as my R installation ( “C:/R/” )
  3. Open R and run the following code to install the packages and execute a quick test function:

# install packages
install.packages(c("Rcpp", "rbenchmark", "inline", "Runit"))
 
# load main two packages
library(Rcpp)
library(inline)

src <- "int n = as(ns);
double x = as(xs);
for (int i=0; i<n; i++) x=1/(1+x);
return wrap(x); "

test <- cxxfunction(signature(ns="integer",xs="numeric"),body=src, plugin="Rcpp")
test(1,1)

The test function simply executes the following calculation 1/(1+x) n times. If your test function executes without any problems then you are ready to execute the C++ version of min.distance. My C++ min.distance function is syntactically similar to the R code:

#min.distance.c function
library(inline)
src <-"
  Rcpp::NumericMatrix datacpp(data); //nx2 matrix
  Rcpp::NumericVector parcpp(par); //vector
  int nrowdata = datacpp.nrow();
  int szpar = parcpp.size();
  int hszpar = szpar / 2;

  // Calculate square distance from each point to each transport hub
  std::vector< std::vector > M ( nrowdata , std::vector ( hszpar 

) );
  for (int i=0; i<nrowdata; ++i)
  {
	for (int j=0; j<szpar-1; j+=2)
	{
		M[i][j/2] = pow( (datacpp(i,0) - parcpp[j]), 2.0) + pow((datacpp(i,1) - parcpp[j+1]) , 2.0);
	}
  }

  // Calculate the min distance to any transport hub
  double sssd=0;
  double smallest=0;
  for (int i=0; i<nrowdata; ++i)
  {
	smallest = M[i][0];
	for (int j=0; j<hszpar; ++j)
	{
		if ( M[i][j] < smallest ) smallest = M[i][j];
	}
	sssd += smallest;
  }

  return wrap(sssd);
"
min.distance.c <- cxxfunction(signature(data="numeric", par="numeric"), body=src, plugin="Rcpp")

To find out more about Rcpp I recommend the following workshop slides as I found them a useful resource for understanding how-to use Rcpp + inline. If you are new to C++ you will notice the first time that you create the function the code will need to be compiled into executable code – once this has happened you code will execute lighting fast. Note: other than min.distance.c expecting a matrix rather than a data.frame it performs the same operation as min.distance and thus may be substituted as the cost function in optim. Performance Testing To test the performance difference I used the Rbenchmark library and ran 10 iterations for each version for comparision.

Length (Par) CPP R
2 0.3 0.86
4 0.23 6.75
6 0.36 9.9
8 1.17 20.94
10 1.39 53.26
12 2.53 64.27

As can be seen from the data above optim with the min.distance.c as cost function took 1.39 seconds to execute whilst optim with min.distance as cost function took 53.26 seconds – that’s a 38.3 speed improvement  something that would have take 24 hours to run will now take 40 minutes!

Comparison of performance of optim using Rcpp vs R cost functions.
Comparison of performance of optim using Rcpp vs R cost functions.

Here’s my benchmarking code so you can use it for your own needs:

#many stations case
buildings=data.frame(
	     px=rnorm(500, 0 , 1), 
              py=rnorm(500, 0 , 1)
)
mbuildings <- as.matrix(buildings)
hubs=runif(10, 0 , 1)

#regular r function
min.distance <- function(data, par) {
	
	#calculate the distance from each point to each transport hub
	M <- matrix( nrow=nrow(data) , ncol=length(par)/2 )
	for ( i in seq(1, length(par), by=2) )
	{
		col <- (i+1)/2
		M[,col] <- (data$px-par[i])^2  + (data$py-par[i+1])^2 
	}

	#calculate the min distance to any transport hub
	minD <- vector(mode = "numeric" , length=nrow(data) )
	for ( i in seq(1, nrow(M)) )
	{
		minD[i] <- min(M[i,])
	}
	
	min.distance <- sum ( minD )
}


#min.distance cpp function
library(inline)
src <-"
  Rcpp::NumericMatrix datacpp(data); //nx2 matrix
  Rcpp::NumericVector parcpp(par); //vector
  int nrowdata = datacpp.nrow();
  int szpar = parcpp.size();
  int hszpar = szpar / 2;

  // Calculate square distance from each point to each transport hub
  std::vector< std::vector > M ( nrowdata , std::vector ( hszpar 

) );
  for (int i=0; i<nrowdata; ++i)
  {
	for (int j=0; j<szpar-1; j+=2)
	{
		M[i][j/2] = pow( (datacpp(i,0) - parcpp[j]), 2.0) + pow((datacpp(i,1) - parcpp[j+1]) , 2.0);
	}
  }

  // Calculate the min distance to any transport hub
  double sssd=0;
  double smallest=0;
  for (int i=0; i<nrowdata; ++i)
  {
	smallest = M[i][0];
	for (int j=0; j<hszpar; ++j)
	{
		if ( M[i][j] < smallest ) smallest = M[i][j];
	}
	sssd += smallest;
  }

  return wrap(sssd);
"
min.distance.c <- cxxfunction(signature(data="numeric", par="numeric"), body=src, plugin="Rcpp")


library ( rbenchmark )
benchmark (
	optim(par = hubs, min.distance, data = buildings , method = "BFGS"),
	optim(par = hubs, min.distance.c, data = mbuildings , method = "BFGS"),
	columns=c("test", "replications", "elapsed", "relative"),
	order="relative", replications=10)

	
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s