Dplyr vs Datatable

In the world of data science in R, the battle between dplyr and datatable is real. Here I compare their performance against base r commands for some common tasks. Who will be the winner on speed and simplicity? Make random datasets set.seed(71) size1 <- 4*10^6 size2 <- size1 * 0.1 df1 <- data.frame(id=paste0("SERVICE", 1:size1), value=rnorm(size1), stringsAsFactors=FALSE) df2 <- data.frame(id=paste0("SERVICE", sample(1:size1, size2)), value=rnorm(size2), stringsAsFactors=FALSE) dt1 <- data.table(df1) dt2 <- data.table(df2) # mtcars data M <- data.
Continue reading

In this post, I’ll cover how to better customize some settings so to get your own custom domain for your EC2 instance. Say, app.example.com. Anyone interested in customizing an EC2 instance can use this - not just those who build R Shiny apps. I assume you already read part 1, where it was described how to launch an R Shiny app on EC2. I assume you already have some EC2 instance running with some useful app.
Continue reading

I want to run R Shiny on AWS using Docker. Here’s how to do it. In part 2, I’ll demonstrate how to get a custom domain and make the URL look clean. Useful background reading If you’re already comfortable with Docker, skip to the next section. Great Simple tutorial on using Docker and Flask: Short and sweet. Docker for Beginners: Verbose and lengthy. Excellent introduction. Walks you through all the jargon.
Continue reading

byu football

Goal of this post. Answer some interesting questions about BYU football. Dive into different modeling approaches. I don’t explain my thinking below, but some of the charts might be cool. Some questions of interest How is Kilani Sitake doing in his second season compared to past BYU coaches? More challenging: how’s he doing relative to all second-season coaches? df_in <- read.csv(file.path(fp_data, 'byu_seasons.csv')) %>% distinct() Some basic questions: * How many years do we have data on?
Continue reading

The goal of this tutorial is to do the following: Collect addresses (via Google Forms) Download to R (via googlesheets) Geocode them (via geocode) Plot them (using leaflet) Get driving distance between them (via gmapsdistance) Cluster them (kmeans) Making the leaflet plot fancy 1. Collect Perhaps in a future post I’ll explore googleformr. For now, I create forms the old-school way.
Continue reading

The beauty of open source is “Oh, let me just download that package and I can do amazing things!”. The reality is “ok, I downloaded it, and I got the ‘hello world’ example working. But now to actually get it to do what I want in the environment that I want takes like…now 30 hours? Just one more bug and I’ll finally give up…” Bugs I hit: I hit a lot of bugs when building my Leaflet tutorial.
Continue reading

So you want to buy a car, but you don’t know anything about them? Welcome to my life. You show up at the dealer and there’s a sticker on the window. You know the difference between make and model, but you soon learn what a trim is. Some versions come with leather. Some have a sun roof. Some have all wheel drive. Some have 20k in miles, and a similarly priced car in a higher trim is at 40k miles.
Continue reading

Author's picture

Bryan Whiting

father, innovator, data scientist

Data Scientist

Washington, D.C.