How efficient is the placement of London tube and train stations with respect to minimising walking distance?

A colleague and friend of mine James Ramsden asked me if the results of my London tube station walkability study would be any different if I included train stations? This is a great question considering that North East and South East London is not particularly well served by tubes but is well served by the trains.

To answer this question I got my hands on the UK NaPTAN (National Public Transport Access Nodes) Dataset to complement my TFL tube station data.

To What Extent does Adding Train stations to the 10K radius London Tube Network improve the walkability of the service?

As expected by adding the train stations the transport network becomes more efficient than a random model which we have been using as a baseline – but not impressively so. Tube & train station placement is merely 17% better than random when measuring efficiency in terms of our walking distance metric. When looking at our optimised placement it is merely 27% better than the random model! This seems to be telling us that placing transport nodes to improve walking distances is hard but possible.
                                  Min.  1st Qu.   Median     Mean   3rd Qu.     Max.     SSSD           Efficiency
Random Allocation  1.82  346.70     527.50   561.60   732.60  1675.00 61229814         0%
Tube & Trains         1.046 272.40  421.40  466.30  599.500 1997.00 50840078    16.968%
Optimised Network  0.999 213.30  343.00  405.90  546.900 1820.00 44250822    27.729%

As can be seen above there is still theoretically room for improvement within the tube and train network of London.

To understand how this analysis was performed check out my original blog post on ‘How efficient is the placement of London Tube stations in minimising walking distances?’.

Histogram of Random Allocation of Tube and train stations
Histogram of Random Allocation of Tube and train stations
Histogram of Tube and Train stations as is
Histogram of Tube and Train stations as is
Histogram of optimised placement of tube and train stations
Histogram of optimised placement of tube and train stations
Optimised placement of tube and train stations
Optimised placement of tube and train stations

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
#many stations case using empirical data


library(maptools)
setwd("C:\\Users\\David\\Documents\\GIS\\LondonTubeStations")
tubestations <- readShapeSpatial("10K_Buf_LondonStations.shp")
trainstations <- readShapeSpatial("10K_Buf_LondonTrainStations.shp")
buildings <- readShapeSpatial("10K_Buf_Westminster_Bldg_Centroids.shp")

#load building data into dataframe
buildings=data.frame(
	     px=coordinates(buildings)[,1], 
              py=coordinates(buildings)[,2]
)

#load tubedata into vector
hubs <- vector(mode="numeric" , 2 * nrow(coordinates(tubestations)) )
j=1
for ( i in seq(1, nrow(coordinates(tubestations)) ) )
{

	hubs[j] <- coordinates(tubestations)[i,1]
	j=j+1
	hubs[j] <- coordinates(tubestations)[i,2]
	j=j+1
}

#load traindata into vector
trainhubs <- vector(mode="numeric" , 2 * nrow(coordinates(trainstations)) )
j=1
for ( i in seq(1, nrow(coordinates(trainstations)) ) )
{

	trainhubs[j] <- coordinates(trainstations)[i,1]
	j=j+1
	trainhubs[j] <- coordinates(trainstations)[i,2]
	j=j+1
}

#include traindata with tubedata
hubs = c(hubs,trainhubs)

#generate random data for benchmark
minX = min ( coordinates(tubestations)[,1] )
minY = min ( coordinates(tubestations)[,2] )
maxX = max ( coordinates(tubestations)[,1] )
maxY = max ( coordinates(tubestations)[,2] )

minX = min (minX , min ( coordinates(trainstations)[,1] ))
minY = min (minY , min ( coordinates(trainstations)[,2] ))
maxX = max (maxX , max ( coordinates(trainstations)[,1] ))
maxY = max (maxY , max ( coordinates(trainstations)[,2] ))

rndhubs <- vector(mode="numeric" , length(hubs) )
j=1
for ( i in seq(1, nrow(coordinates(tubestations)) + nrow(coordinates(trainstations))) )
{

	rndhubs[j] <- runif( 1 , minX , maxX )
	j=j+1
	rndhubs[j] <- runif( 1 , minY , maxY )
	j=j+1
}

#min.distance.measure function for looking at distribution of distances
min.distance.compare <- function(data, par) {
	
	#calculate the distance from each point to each transport hub
	M <- matrix( nrow=nrow(data) , ncol=length(par)/2 )
	for ( i in seq(1, length(par), by=2) )
	{
		col <- (i+1)/2
		M[,col] <- (data$px-par[i])^2  + (data$py-par[i+1])^2 
	}

	#calculate the min distance to any transport hub
	minD <- vector(mode = "numeric" , length=nrow(data) )
	for ( i in seq(1, nrow(M)) )
	{
		minD[i] <- min(M[i,])
	}
	
	min.distance.compare <- sqrt( minD ) #use this for comparisons in meters
}

#min.distance.c function
library(inline)
src <-"
  Rcpp::NumericMatrix datacpp(data); //nx2 matrix
  Rcpp::NumericVector parcpp(par); //vector
  int nrowdata = datacpp.nrow();
  int szpar = parcpp.size();
  int hszpar = szpar / 2;

  // Calculate square distance from each point to each transport hub
  std::vector< std::vector<double> > M ( nrowdata , std::vector<double> ( hszpar ) );
  for (int i=0; i<nrowdata; ++i)
  {
	for (int j=0; j<szpar-1; j+=2)
	{
		M[i][j/2] = pow( (datacpp(i,0) - parcpp[j]), 2.0) + pow((datacpp(i,1) - parcpp[j+1]) , 2.0);
	}
  }

  // Calculate the min distance to any transport hub
  double sssd=0;
  double smallest=0;
  for (int i=0; i<nrowdata; ++i)
  {
	smallest = M[i][0];
	for (int j=0; j<hszpar; ++j)
	{
		if ( M[i][j] < smallest ) smallest = M[i][j];
	}
	sssd += smallest;
  }

  return wrap(sssd);
"
min.distance.c <- cxxfunction(signature(data="numeric", par="numeric"), body=src, plugin="Rcpp")

#find minimum distance for empirical data
dist <- min.distance.compare ( buildings , hubs)
summary(dist)
hist(dist , breaks = "Sturges")

#find minimum distance for random data
rnddist <- min.distance.compare ( buildings , rndhubs)
summary(rnddist)
hist(rnddist , breaks = "Sturges")

#optimisation code
mbuildings <- as.matrix(buildings)
result <- optim(par = hubs, min.distance.c, data = mbuildings , method = "BFGS")

#display results
plot ( buildings )
for ( i in seq(1, length(result$par), by=2) )
{
	points ( result$par[i] , result$par[i+1] , col = "blue", cex = 1.5)
}

#find minimum distance for optimised empirical data
dist <- min.distance.compare ( buildings , result$par)
summary(dist)
hist(dist , breaks = "Sturges")

#write points to shapefile for visualisation
xcord <- vector(mode="numeric" , length=length(result$par)/2)
ycord <- vector(mode="numeric" , length=length(result$par)/2)
for ( i in seq(1, length(result$par), by=2) )
{
	xcord[(i+1)/2] <- result$par[i]
	ycord[(i+1)/2] <- result$par[i+1]
}
coords <- cbind(xcord,ycord)
spdf <- SpatialPointsDataFrame ( coords , data.frame(1:nrow(coords)))
writeSpatialShape( spdf , "10K_Buf_LondonTubenTrainStations_optim.shp")

#scoring
rnd <- 61229814
TfL <- 50840078
popt <- 44250822

score <- function ( n )
{
	100*(rnd-n)/rnd
}
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s