Back to posts

Using ggplot2 to map margins of victory in three early presidential primaries (Part 2)

Posted on 8th May 2016

Part 1 described the parsing of JSON files obtained from the CNN election results and the calculation of margins of victory for what appear to be six very different races. ggplot2 is an incredibly powerful package, but the development of complicated charts can sometimes be a little daunting. This post describes the construction of maps that display the margins of victory for each party’s race, side-by-side, using ggplot2, labeling, digest, and gridExtra packages.

The maps package was used to prepare shapefiles for the South Carolina and Wisconsin results by county. Political boundaries for the New Hampshire results were obtained from data provided by the University of New Hampshire Complex Systems Research Center, “New Hampshire Political Boundaries at 1:24,000 Scale”. Map datasets were prepared using sp, rgdal, and maptools packages.

Ultimately, I want two maps, each with their own titles, set dynamically, and I want them to share only one of their two respective legends.

MapLayout

An example of how to set up the titles for the first part of this task appears below, using a helpful tip.

rvotes<-as.character(format(sum(rep$tvotes),big.mark=","))
plotr.title <-"Wisconsin Republican Primary: Margins of Victory"
plotr.subtitle <-paste("Total popular votes: ", rvotes)

Data preparation mainly involved loading the shapefiles and merging them with the results data. The South Carolina and Wisconsin datasets, divided into county regions, were relatively simple to prepare using the map_data() function in maps. Preparation of the New Hampshire dataset was more involved and the code is provided at the end of this post.

load("widem2.Rda") # data prepared in Part 1
load("wirep3.Rda") # data prepared in Part 2
rep<-wirep3
dem<-widem2
codf <-subset(map_data("county"),region=='wisconsin') # County shapefile. 
dem$county <- tolower(dem$name) # Prepare to merge on county name 
rep$county <- tolower(rep$name)

# Find the first occurrence of each county in the map file. 
# Identify any counties on the results file that aren't in the shapefile 
# (or are spelled differently).
m <-as.data.frame(match(dem$county,codf$subregion)) # check for mis-matches
m$L1<-seq.int(nrow(m))
nas <- m[rowSums(is.na(m)) > 0,] 
nas <- merge(nas,dem,by="L1")

# Modify the spelling of St. Croix on the results file before merging.
dem$county <-ifelse(dem$county=='st. croix','st croix',dem$county)
rep$county <-ifelse(rep$county=='st. croix','st croix',rep$county)

# Merge the results files to the shapefile
names(codf)[names(codf)=="subregion"] <- "county"

demplot <- merge(x=codf, y=dem, by = "county",all.x=TRUE)
cogps<-unique(demplot[c("county","group")])
rownames(cogps) <- NULL

repplot <- merge(x=codf, y=rep, by = "county",all.x=TRUE)
cogpsr<-unique(repplot[c("county","group")])
rownames(cogpsr) <- NULL

To overlay a set of markers that identify the winner of each county election (or the name, or the proportion of voters...), a separate data frame that contains the information to be charted with a single set of coordinates for plotting is needed.

colabels <- aggregate(cbind(long, lat) ~ county, data=demplot, 
                      FUN=function(x)mean(range(x)))
colabels <- merge(x=colabels, y=cogps, by = "county",all.x=TRUE) # get groups

colabels <- merge(x=subset(dem,select=c(Clinton.x,Sanders.x,county,Winner)),
                  y=colabels,by="county",all.x=TRUE)

Categories for the margins of victory were created by passing the margins to the cut() function and setting breaks at 0, .05, .15, and .5.

demplot <- demplot[order(demplot$margin), ]
demplot$margins <- cut(demplot$margin, 
    breaks = c(0,.05,.15,.5),
    include.lowest=TRUE)
summary(demplot$margins)

The next steps divide the map elements so that they can be used individually to create a single image using gridArrange() later in the program. Because the candidate lists are different, they can share the margins legend only. The candidate legends will be created separately.

The first call to ggplot creates the choropleth map, shaded to reflect the margins of victory, with coord_equal(ratio=1) called to keep distortion to a minimum. The legend for the margins is created in the first chart, using theme(legend.position="bottom" . The overlay is created in a second step, where the winner of each race appears as a point with a different color. The chart creation code for republicans and democrats appears below.

r<-ggplot(repplot, aes(x=long, y=lat, group = group)) +
  geom_polygon(aes(fill = margins), colour = alpha("white", 1/2), size = 0.2) +  
  geom_polygon(data = codf, colour = "white", fill = NA) +
  ggtitle(bquote(atop(.(plotr.title), atop(italic(.(plotr.subtitle)), "")))) +
  scale_fill_manual(name="Margins", values = c("#CC99CC","#996699","#663366")) + 
  coord_equal(ratio=1) + 
  theme(legend.position="bottom",
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        panel.background=element_blank())

# add points
r2<-r + geom_point(data = colabelsr,
                   aes(long, lat, colour=Winner),size=2) +
  scale_colour_manual(values=c("Cruz"="red", "Trump"="darkorange", "Kasich"="maroon",
 "Bush"="tomato", "Fiorina"="lightcoral",
 "Carson"="firbrick", "Rubio"="#b34d4d", "Christie"="#cc4400"))

# democrats
d<-ggplot(demplot, aes(x=long, y=lat, group = group)) +
  geom_polygon(aes(fill = margins), colour = alpha("white", 1/2), size = 0.2) +  
  geom_polygon(data = codf, colour = "white", fill = NA) + 
  ggtitle(bquote(atop(.(plotd.title), atop(italic(.(plotd.subtitle)), "")))) +
  scale_fill_manual(name="Margins (D)",
        values = c("#CC99CC","#996699","#663366")) + 
  coord_equal(ratio=1) + 
  theme(legend.position="bottom",
        axis.line=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        panel.background=element_blank())

# add points
d2<-d + geom_point(data = colabels,
                   aes(long, lat, colour=Winner),size=2) +
  scale_colour_manual(values=c("Clinton"="lightblue", "Sanders"="darkblue"))

I only want the candidates' legend element from the chart produced through ggplot below. I used prop, the proportion of the state's voters, but any value for y would do.

dpts <-ggplot(colabels,aes(prop,Winner, colour=Winner)) + 
  geom_point() +
  scale_colour_manual(values=c("Clinton"="lightblue", "Sanders"="darkblue")) +
  theme(legend.direction="horizontal")

rpts <-ggplot(colabelsr,aes(prop,Winner, colour=Winner)) + 
  geom_point() +
  scale_color_manual(values=c("Cruz"="red", "Trump"="darkorange", "Kasich"="maroon",
  "Bush"="tomato", "Fiorina"="lightcoral","Carson"="firbrick",
  "Rubio"="#b34d4d", "Christie"="#cc4400"))+
  theme(legend.direction="horizontal")

The next step uses a solution from Stack Overflow that can be used to allow multiple graphs to share one or more legends. Using this function once for each legend produces three graphical objects that will be placed on the canvas by grid.arrange.

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

mlegend <-g_legend(r)    # margins
dlegend <-g_legend(dpts) # democratic hopefuls
rlegend <-g_legend(rpts) # republican hopefuls

The next bit of code tells grid.arrange how the canvas should be laid out. the three arrangeGrob() function calls arrange the grobs into three sections.

grid.arrange(arrangeGrob(d2 + theme(legend.position="none"),
                         r2 + theme(legend.position="none"),
                         nrow=1),
             arrangeGrob(mlegend,nrow=1),
             arrangeGrob(dlegend,rlegend,nrow=2),
             ncol=1,
             heights=c(10, 1,1))

The final graphs appear below. Click to enlarge.

New Hampshire Primary Results
Tuesday, February 9, 2016

South Carolina Primary Results
Saturday, February 27, 2016

Wisconsin Primary Results
Tuesday, April 5, 2016

The following represents the steps used to create the New Hampshire map data source file.

require(sp)
require(rgdal)
library('maptools') 

conm <- readShapeSpatial("//NewHampshireShpPb/pbp")

dem$name <-ifelse(dem$name=="Harts Location","Hart's Location",dem$name)
dem$name <-ifelse(dem$name=='Waterville','Waterville Valley',dem$name)
dem$name <-ifelse(dem$name=="Wentworth's Location","Wentworths Location",dem$name)

rep$name <-ifelse(rep$name=="Harts Location","Hart's Location",rep$name)
rep$name <-ifelse(rep$name=='Waterville','Waterville Valley',rep$name)
rep$name <-ifelse(rep$name=="Wentworth's Location","Wentworths Location",rep$name)

names(rep)[names(rep)=="name"] <- "NAME"
names(dem)[names(dem)=="name"] <- "NAME"

names(conm)
conm<- fortify(conm,region="NAME")
conm$county=tolower(conm$id)

dem$county <- tolower(dem$NAME)
rep$county <- tolower(rep$NAME)

codf <-conm # use codf for remaining code

demplot <- merge(x=codf, y=dem, by = "county",all.x=TRUE)
cogps<-unique(demplot[c("county","group")])
rownames(cogps) <- NULL
demplot <- demplot[with(demplot, order(county,order)), ]