The goal of this tutorial is to do the following:

  1. Collect addresses (via Google Forms)
  2. Download to R (via googlesheets)
  3. Geocode them (via geocode)
  4. Plot them (using leaflet)
  5. Get driving distance between them (via gmapsdistance)
  6. Cluster them (kmeans)
  7. Making the leaflet plot fancy

1. Collect

Perhaps in a future post I’ll explore googleformr. For now, I create forms the old-school way.

Create a Form:

  1. Visit drive.google.com
  2. Create a Form
  3. Collect responses (google a random “fake address generator” for inspiration)
  4. Save the responses to a new sheet fake_addresses

Steps in pretty picture form:

2. Connect and Download

I use the googlesheets API to connect to the google sheet of choice. Tutorial for first-timers. Also, this vignette .

devtools::install_github("dkahle/ggmap") # for the register_google function
devtools::install_github("rodazuero/gmapsdistance")
install.load::install_load('gmapsdistance', 'googlesheets', 'dplyr', 'leaflet', 'magrittr', 'ggmap', 'cluster', 'factoextra')

Run this to authorize access to your Google account. It’s pretty nice!

gs_auth(new_user=TRUE)

Identify your sheet name and pull the data. Write to disk to save time in the future.

wb_name <- gs_title('fake_addresses')
# Or, if published to web, enter the key after /d/ in the url:
# wb_name <- gs_key('198ng981...')
wb <- gs_read(wb_name)
# Save it to csv (you could skip this)
wb %>% write.csv('data/20180720/addresses.csv')

Read it in, inspect, and clean.

wb <- read.csv('data/20180720/addresses.csv')
wb %<>% select(-X, -Timestamp)
colnames(wb) <- c('address', 'favorite_color', 'favorite_food')
wb$address %<>% as.character() # geocode requires a character, not level
wb %>% knitr::kable()
address favorite_color favorite_food
3094 Bell Street, New York, New York Blue Peanuts
1181 Williams Avenue, New York, New York Blue Salmon
4289 Taylor Street, new york, new york Green Salmon
2239 Duncan Avenue, new york, new york Red Caviar
2379 Grove Street, new york, ny Blue Salmon
302 Pallet Street, new york, new york Red Caviar
2664 Oakwood Avenue, new york, new york Red Peanuts
3897 Hoffman Avenue, new york, new york Green Peanuts

3. Geocode the addresses

To use the geocode funciton, you may need to create a google maps API if you over-do your quota (daily limit is 2500). See footnotes for API help.

# load the googlemaps api
api_key = trimws(readChar('data/google_maps_api', 1000))
register_google(key = api_key)
substr(api_key, 1, 3)
## [1] "AIz"

Use geocode() for each address. (This code was taken from somewhere, though I can’t remember where.)

df <- wb
df$lon <- 0
df$lat <- 0
df$geoAddress <- ""

for(i in 1:nrow(df))
{
  if(!is.na(df$address[i])){
    result <- geocode(df$address[i], output = "latlona", source = "google")
    df$lon[i] <- as.numeric(result[1])
    df$lat[i] <- as.numeric(result[2])
    df$geoAddress[i] <- as.character(result[3])
  }
}
df %>% knitr::kable()
X address favorite_color favorite_food lon lat geoAddress
1 3094 Bell Street, New York, New York Blue Peanuts -74.07667 40.61294 bell st, staten island, ny 10305, usa
2 1181 Williams Avenue, New York, New York Blue Salmon -73.89131 40.65006 1181 williams ave, brooklyn, ny 11236, usa
3 4289 Taylor Street, new york, new york Green Salmon -74.12303 40.63378 taylor st, staten island, ny 10310, usa
4 2239 Duncan Avenue, new york, new york Red Caviar -73.85843 40.87307 duncan st, bronx, ny 10469, usa
5 2379 Grove Street, new york, ny Blue Salmon -74.00398 40.73292 grove st, new york, ny 10014, usa
6 302 Pallet Street, new york, new york Red Caviar -73.97018 40.75204 300-302 e 46th st, new york, ny 10017, usa
7 2664 Oakwood Avenue, new york, new york Red Peanuts -73.66295 42.76183 oakwood ave, troy, ny, usa
8 3897 Hoffman Avenue, new york, new york Green Peanuts -73.72631 40.70106 3897 hoffman ave, elmont, ny 11003, usa

4. Plot in leaflet

Lets visualize our people using leaflet.

leafMap <- df %>%
   leaflet() %>% 
    addTiles() %>%
    addMarkers(~lon, ~lat)
leafMap

We see we got one person all the way up in Troy.

5. Get driving distance between points

With gmapsdistance you can calculate the walking, driving, biking, and public transportation distance between two coordinates. (See docs.)

# This set.api.key is from gmapsdistance (as opposed to ggmap). You need to install the github version if you want `driving` to work as of this post. See https://github.com/rodazuero/gmapsdistance/issues/17
set.api.key(api_key)

# Get the distance between two locations
df %<>% mutate(latlon = paste(lat, '+', lon, sep = ''))
results = gmapsdistance(origin = df$latlon[1], destination = df$latlon[2], mode = 'driving', shape = 'long')
results
## $Time
## [1] 1929
## 
## $Distance
## [1] 28301
## 
## $Status
## [1] "OK"

The results Time are the time in seconds it takes to go between the two points, and the Distance is in meters.

Now, if you want, you can use this to calculate the driving distance between all of your addresses. Or, use it to optimize a route. But we’ll stop our use here. I just explored that for fun.

6. Clustering people on the map

Lets say you want to cluster your peoples into one of three groups. Pretty simple with kmeans clustering (or another clustering method).

mat <- df %>% select(lat, lon) %>% as.matrix()
k <- kmeans(mat, centers=3)
df$k_clust <- k$cluster
df %>% select(lat, lon, k_clust) %>% knitr::kable()
lat lon k_clust
40.61294 -74.07667 3
40.65006 -73.89131 2
40.63378 -74.12303 3
40.87307 -73.85843 2
40.73292 -74.00398 3
40.75204 -73.97018 3
42.76183 -73.66295 1
40.70106 -73.72631 2

Now lets see these clusters in a map.

# Function to color
df$mcolor <- plyr::mapvalues(df$k_clust, from=c(1, 2, 3), 
                             to=c('blue', 'orange', 'green'))
# Create new markers
icons <- awesomeIcons(markerColor = df$mcolor)

leafMap <- df %>%
  leaflet() %>% 
  addTiles() %>%
  addAwesomeMarkers(~lon, ~lat, icon=icons)
leafMap

7. Making the plot all fancy

I don’t really like this default leaflet map, so lets up it. Also, lets visualize the favorite food and add some meaningful labels.

Changing the icons:

df$icons <- plyr::mapvalues(df$favorite_food, from=c("Peanuts", "Salmon", "Caviar"), 
                             to=c('bus', 'bicycle', 'car'))

icons <- awesomeIcons(markerColor = df$mcolor, 
                      library = 'fa', #fontawesome (google the icons)
                      icon = df$icons)

df %>%
  leaflet() %>% 
  addTiles() %>%
  addAwesomeMarkers(~lon, ~lat, icon=icons)

Changing the basemap to make the colors ‘pop’ a little more:

df %>%
  leaflet() %>% 
  addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addAwesomeMarkers(~lon, ~lat, icon=icons)

(Other basemap examples here: https://leaflet-extras.github.io/leaflet-providers/preview/)

Then, let’s finish off with some pop-up labels (click to see the action):

df %<>% mutate(
 popup = paste(sep = "<br/>", 
               paste("<b>Favorite Food:", favorite_food, "</b>"), 
               paste("Favorite Color:", favorite_color), 
               address)
)

df %>%
  leaflet() %>% 
  addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addAwesomeMarkers(~lon, ~lat, icon=icons,
                    label = ~address,
                    popup = ~popup) 

Google maps API

You don’t have to use the API if you have less than 2500 calls per day. But if you want ot use geocode and gmapsdistnace, you have to enable Distance Matrix API and Geocoding API on the Google Maps API. It’s fairly easy: just create an account and find those two boxes to click.