Back to posts

Force Directed Graph Construction Using D3ForceNetwork

Posted on 5th May 2016

This post describes the construction of a similarity matrix and its use in creating grouped network graphs to examine freshwater access in rural regions of 194 countries around the world. The data comes from the WHO/UNICEF Joint Monitoring Programme (JMP) for Water Supply and Sanitation, downloaded from The World Bank December 26, 2013.

Dataset construction

To run the code, you’ll need Christopher Gandrud’s d3Network package.

setwd("C:/_Rproject/ForceDirected")
require('d3Network',lib.loc="c:/r/packages/") 

The following code snippet reads a .csv file containing two columns, Country (after removing any accents and diacritical marks) and Access_Rural, from the table linked above, strips trailing blanks off columns, and creates a data frame called water.

water <- read.csv(file="3.5_Freshwater_useForCooccurrence_clean.csv", strip.white=TRUE,
head=TRUE,sep=",", na.strings=c("."),
colClasses=c('character','numeric'))

The meta data, available with the table linked earlier, contains the table name, income group, currency, region, and other fields for each country. The following commands load the data into a data frame and subset the data frame to three columns of interest. I wanted High Income counties in one group, regardless of OECD membership status, so the group names are cleaned before converting Income.

meta <- read.csv(file="FreshwaterMeta.csv",strip.white=TRUE,
head=TRUE,sep=",", na.strings=c(" "))
meta <- subset(meta,Income.Group != "",select=c("Table.Name","Income.Group","Region"))
meta[2] <- lapply(meta[2], as.character)
meta$inc <-ifelse(substr(meta$Income.Group,1,1) =='H',"High income",meta$Income.Group)
meta$ecogrp <- as.integer(factor(meta$inc, levels=c("Low income","Lower middle income","Upper middle income","High income")))

water <-merge(water,meta, by.x = "Country", by.y = "Table.Name", all.x = TRUE)

Given the size of my drawing area, between 800 and 1000 pixels, I divided the data frame by region, to restrict the number of countries to a range of 50-70. The following command creates the data frame combining two regions, Europe & Central Asia and East Asia & Pacific, and restricts the resulting data frame to records with non-missing Access_Rural values. Other regional data frames were created in the same manner.

waterECA <- subset(water,Region=="Europe & Central Asia" & !is.na(Access_Rural)) 

Matrix Construction

To create the similarity matrix, I began with a square matrix of zeros with a row for each country.

m <- matrix(rep(0), nrow=nrow(water), ncol=nrow(waterECA))

The waterNLA data frame can now be used to populate m with a set of non-negative values, bound between 0 and 100, that reflect the level of agreement between each pair of countries. in the matrix, m, each element, (i,j), will represent the absolute difference in percentages between country i and country j.

for(i in seq_along(waterECA$Country)){
for(j in seq_along(waterECA$Country)){ 
m[i, j] <- abs(waterECA$Access_Rural[i]-waterECA$Access_Rural[j])
   } 
} 
rownames(m) <- waterECA$Country 
colnames(m) <- waterECA$Country

Only the elements above or below m’s diagonal are needed to create the set of edges for the graph. These next steps set m’s upper triangle elements to NULL, coerce m into a table of distinct country pairs and their corresponding similarity estimate, and subset the resulting data frame, links, to non-missing values.

m[upper.tri(m, diag=TRUE)] <- NA
links <- as.data.frame(as.table(m)) 
colnames(links)<-c("source","target","value")
links <- subset(dm, !is.na(value))

Before passing links to d3Network, these next steps assign ordinal values to the source and target countries. Since by default, the levels of “source” and “target” in this case are the unique, alphabetically sorted country names from the same file (waterECA), I used R’s internal ordering of these factors to set the “values”, using the as.integer() function to assign both.

links$sourceN <-as.integer(links$source) -1 # initialize to zero 
links$targetN <-as.integer(links$target) -1 # initialize to zero
links <- subset(links,sourceN != targetN)
links <- subset(links,select=c("sourceN","targetN","value"))

The nodes data frame was created from unique values of the waterECA data frame.

nodes <-as.data.frame(unique(waterECA[,c("Country","ecogrp")]))

Graphing

d3Network’s d3ForceNetwork function will send the contents of the HTML file that displays the graph to the console unless the output is redirected. Since I have to modify the code slightly to render the graph in WordPress and make some other adjustments (described later), I called the sink function first to divert the output to a text file in my working directory.

sink("d3force-waterECA.txt")
d3ForceNetwork(Links = links, Nodes = nodes, 
Source = "sourceN", Target = "targetN", 
Value = "value", NodeID = "country",
Group = "ecogrp", width = 800, height = 800, 
opacity = 0.9)

The output of d3ForceNetwork can be easily customized. For example, by default, the link distance is fixed and the values in the set of edges determines the stroke width. Because each node in this data is connected to every other node, without the modification, the resulting graph looks like this.

Opening the text output file and varying the force layout’s linkDistance and charge attributes helped make the graph more readable.

Original output:

var force = d3.layout.force()
.nodes(d3.values(nodes)) 
.links(links) 
.size([width, height]) 
.linkDistance(50) 
.charge(-120) 
.on("tick", tick)
.start(); 

Sample modification:

.linkDistance(function(d) { return (d.value +1)*9; })
.charge(-1*Math.pow(nodes.length, 2)) 

Examples of graphs produced using this method appear under the interactives in the home page menu.