**Curiosity causes me to ask questions… ** For the sake of curiosity I was interested in finding out:

- “
*How efficient is the placement of London Tube stations with respect to minimizing walking distances to potential destinations i.e. the nearby buildings?*“.

There are a number of reasons why this is interesting to me:

- So many people commute everyday that reducing the walking distances could lead to more efficient transit;
- The same area could be covered by fewer stations, or with improved station placement a greater area could be covered with the same number of stations;
- Wouldn’t it be cool to measure how far from optimal the tube station placements are for walking distances 🙂

**How do you even begin to answer the above question?** Well firstly we require a method for calculating the optimal placement of a set of places (tube stations) to minimize travel distance to another set of places ( e.g. buildings). This turns out to be a pretty simple problem to solve if you represent each place as a point.

*The Very Simple Single Station Case*

In the single station case you can simply minimize the sum of the distances between your station and all destination places. Here is some sample R code for you to run yourself:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
#1 Station case optimisation case buildings=data.frame( px=rnorm(10, 0 , 1), py=rnorm(10, 0 , 1) ) min.distance <- function(data, par) { with(data, { sum ( (px-par[1])^2 + (py-par[2])^2 ) }) } result <- optim(par = c(0, 1), min.distance, data = buildings) plot ( buildings ) points ( result$par[1] , result$par[2] , col = "blue", cex = 1.5) |

The code above does the following:

- creates 10 randomly placed points ( buildings )
- defines a function to minimize called min.distance which calculates the sum of the distances between the buildings (data) and coordinates to optimize par[1] and par[2] (station coordinates)*.
- finds the optimal x and y coordinates using optim
- plots the buildings
- draws a blue point representing the optimal placement of the station.

*If you have particularly eager eyes you will notice that I am optimizing against distance squared rather than distance. This is purely a tiny optimization to avoid performing the square root optimization – we are able to do this because the values that minimize the distance squared are exactly the same as those that minimize the square root of distance squared. *The *

*Simple Multiple Station Case*

In the multiple station case you can no longer simply minimize the sum of the distances of your station. If you try this strategy all your stations will be on top of each other. Instead you need to minimize the sum of shortest distances from any building to the closest station. At first this may sound tricky but it is surprisingly simple. The simplest way to do this is to calculate the distance from all stations to all buildings and then identify the minimum value for each building. This can be done creating a matrix M where each row represents the distances from a building to every station. See my R code below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
#many stations case buildings=data.frame( px=rnorm(10, 0 , 1), py=rnorm(10, 0 , 1) ) hubs=c( 0 , 0 , -1 , 1 , 1 , -1 ) min.distance <- function(data, par) { #calculate the distance from each point to each transport hub M <- matrix( nrow=nrow(data) , ncol=length(par)/2 ) for ( i in seq(1, length(par), by=2) ) { col <- (i+1)/2 M[,col] <- (data$px-par[i])^2 + (data$py-par[i+1])^2 } #calculate the min distance to any transport hub minD <- vector(mode = "numeric" , length=nrow(data) ) for ( i in seq(1, nrow(M)) ) { minD[i] <- min(M[i,]) } min.distance <- sum ( minD ) } result <- optim(par = hubs, min.distance, data = buildings , method = "BFGS") plot ( buildings ) for ( i in seq(1, length(result $par), by=2) ) { points ( result$par[i] , result$par[i+1] , col = "blue", cex = 1.5) } |

The above code only significantly differs from the single station example in the min.distance function. The important differences are as follows:

- We create a matrix M with the same number of rows as buildings and the same number columns as tube stations (hubs) – we calculate hub number by dividing the number of input params ( values to optimize) by 2;
- We create a for loop to iterate through each station calculating the distances to all buildings;
- We create a vector minD to store the minimum distance to a station for each building;
- We iterate through each row finding the shortest distance to a station and storing it in minD;
- We return the sum of minD which is the ‘sum of shortest distances from any building to the closest station’.

**Scaling this up to London?**

I will cover this in my next blog post. It will be more or less a case of heading over to http://www.openstreetmap.org/#map=10/51.5147/-0.2005 to download some real data.

I will be downloading and pre-processing the data using QGIS and then exporting the data as a SHP file and using maptools to import the data – see R Spatial CheatSheet.

I will complete this blog post soon to show you how-to do this. Time for bed 🙂