How do you create the social network of your office using R?

Recently I have been analysing Skype session data and personal location tracking data to understand how people use office spaces. Skype session data can give you a high-level approximation of office utilisation on the basis of the number and duration of calls a person receives per day on average. E.g. If a person generates / receives 0 calls when on average they call/receive 5 times per day then you can infer the probability that they were away from their desk. Similarly individual location tracking data gives you the location and time of a person (with some amount error) that gives you an indication of where they utilise within a building.

In this post, instead of being interested in space,  I want to reveal the structure of the interactions between people. So I will construct a network of interactions between people and then use a community detection algorithm (WalkTrap) to identify the groups / communities within the network. All of this will be produced using a two of my favourite packages sqldf and igraph.

The Data

For the purposes of this post, I’ve included example csv data one that describes the people (people) which is empty of identifying info for anonymity purposes and another that details the interactions (interactiondata). Both files are very simple. People.csv contain only a personid and interactiondata.csv contains only the ids of the persons interacting and the duration.

The Code

Read in the data from this blog:

#Read the interaction data
interactiondata <- read.csv("https://logicalerrors.files.wordpress.com/2015/11/interactiondata.xls" , stringsAsFactors = FALSE)

#Read the people data
users <- read.csv("https://logicalerrors.files.wordpress.com/2015/11/people1.xls" , stringsAsFactors = FALSE)

Use SQL Group By Query to Summarise interactions between each person:

#Use SQL Group By Query to Summarise interactions between each person
library(sqldf)
nte = sqldf("SELECT Person1ID, Person2ID, SUM(Duration) FROM interactiondata GROUP BY Person1ID, Person2ID" ,drv='SQLite')
names ( nte ) <- c ( "Person1ID" , "Person2ID", "CallMinutes" )
nte <- subset ( nte , is.na(as.numeric(Person1ID)) == FALSE  ) #remove entries with invalid personid
nte <- subset ( nte , is.na(as.numeric(Person2ID)) == FALSE  ) #remove entries with invalid personid

Filter out interactions with people not listed in users:

#Filter out interactions with people not listed in users
nte <- with(nte, nte[Person1ID %in%  users$PersonID, ])
nte <- with(nte, nte[Person2ID %in%  users$PersonID, ])

Add total interaction duration for each user:

#Add total interaction duration for each user
callee.Time <- sqldf("SELECT Person1ID, SUM(Duration) FROM interactiondata GROUP BY Person1ID" ,drv='SQLite') #sum interactions where person is person1id
names ( callee.Time) <- c("Person1ID", "CalleeMinutes")

caller.Time <- sqldf("SELECT Person2ID, SUM(Duration) FROM interactiondata GROUP BY Person2ID" ,drv='SQLite') #sum interactions where person is person2id
names ( caller.Time) <- c("Person2ID", "CallerMinutes")

users <- merge( users , callee.Time , by.x="PersonID" , by.y="Person1ID" , all.x = T) #'glue' calleeTime to users
users <- merge( users , caller.Time , by.x="PersonID" , by.y="Person2ID" , all.x = T) #'glue' callerTime to users
users$CalleeMinutes[is.na(users$CalleeMinutes)] = 0 #Assign 0 to CalleeMinute values that are non-numeric 
users$CallerMinutes[is.na(users$CallerMinutes)] = 0 #Assign 0 to CallerMinute values that are non-numeric
users$Interaction <- as.numeric(users$CallerMinutes) + as.numeric(users$CalleeMinutes) #sum to calc total interaction

Create Nodes and Links data.frames for igraph:

#Create Links
from <- as.character(nte$Person1ID)
to <- as.character(nte$Person2ID)
weight <- nte$CallMinutes #alter line thickness
Links <- data.frame( from, to, weight)

#Create nodes
PersonID <- as.character(users$PersonID)
TotalMinutes <- users$Interaction
slabel <- users$PersonID
Nodes <- data.frame ( PersonID, TotalMinutes )

Use igraph to create the network, analyse and plot:

#igraph Analysis
library(igraph)
g.network<-graph.data.frame(Links, directed=F , vertices = Nodes )
wc <- cluster_walktrap(g.network , weights = E(g.network)$weight )
plot(wc, g.network, main="Social Network of Office")

Visualisation

The walktrap algorithm does a good job of identifying the groups / communities within the social network. In the visualisation below I weighted the links by the number of minutes each person interacted. Interestingly this revealed the existence of two distinct groups that interact heavily with each other and one outlier that only interacts with one person.

OfficeNetwork

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s