Recent Articles I've Read

Recent Posts

I’ve taken a particular interest in names since I’m thinking of a name for my to-be-born son. I did a little digging through the Social Security Administration names database, which lists all names given to baby boys and girls in America1. I began this exercise to just get a quality list of ideas, but my curiosity got the better of me. Name Trends Since 1950 What was the most popular boy name since 1950?
Continue reading

I’m going to be a dad (again) soon. I don’t want my boy to have a common name, so I gotta do my research. I’m curious: how do names trend over time? If I pick a name today, will it be popular tomorrow? Step 1: The Social Security Administration reports all baby names each year in the United States, given the name occurs at least 5 times.
Continue reading

Dplyr vs Datatable

In the world of data science in R, the battle between dplyr and datatable is real. Here I compare their performance against base r commands for some common tasks. Who will be the winner on speed and simplicity? Make random datasets set.seed(71) size1 <- 4*10^6 size2 <- size1 * 0.1 df1 <- data.frame(id=paste0("SERVICE", 1:size1), value=rnorm(size1), stringsAsFactors=FALSE) df2 <- data.frame(id=paste0("SERVICE", sample(1:size1, size2)), value=rnorm(size2), stringsAsFactors=FALSE) dt1 <- data.table(df1) dt2 <- data.table(df2) # mtcars data M <- data.
Continue reading

In this post, I’ll cover how to better customize some settings so to get your own custom domain for your EC2 instance. Say, app.example.com. Anyone interested in customizing an EC2 instance can use this - not just those who build R Shiny apps. I assume you already read part 1, where it was described how to launch an R Shiny app on EC2. I assume you already have some EC2 instance running with some useful app.
Continue reading

I want to run R Shiny on AWS using Docker. Here’s how to do it. In part 2, I’ll demonstrate how to get a custom domain and make the URL look clean. Useful background reading If you’re already comfortable with Docker, skip to the next section. Great Simple tutorial on using Docker and Flask: Short and sweet. Docker for Beginners: Verbose and lengthy. Excellent introduction. Walks you through all the jargon.
Continue reading

Here are some notes on machine learning models. Concepts Behind Decision Trees Bagging (boostrap aggregation): Randomly sample with replacement, and average the results. Majority vote: The most commonly-occuring prediction. Internal node: Where the splits occur. Branches: Segments that connect the nodes. Terminal node (leafs, regions): Where the observations end up. The average of the responses (or majority vote) is the prediction for future observations. Gini index: where (m) is the leaf and (k) is the class (0 or 1 for binary classification, but can be extedned for multiple classes).
Continue reading

byu football

Goal of this post. Answer some interesting questions about BYU football. Dive into different modeling approaches. I don’t explain my thinking below, but some of the charts might be cool. Some questions of interest How is Kilani Sitake doing in his second season compared to past BYU coaches? More challenging: how’s he doing relative to all second-season coaches? df_in <- read.csv(file.path(fp_data, 'byu_seasons.csv')) %>% distinct() Some basic questions: * How many years do we have data on?
Continue reading

Author's picture

Bryan Whiting

father, innovator, data scientist

Data Scientist

Washington, D.C.