In this post, I’ll cover how to better customize some settings so to get your own custom domain for your EC2 instance. Say, app.example.com. Anyone interested in customizing an EC2 instance can use this - not just those who build R Shiny apps. I assume you already read part 1, where it was described how to launch an R Shiny app on EC2. I assume you already have some EC2 instance running with some useful app.
Continue reading

I want to run R Shiny on AWS using Docker. Here’s how to do it. In part 2, I’ll demonstrate how to get a custom domain and make the URL look clean. Useful background reading If you’re already comfortable with Docker, skip to the next section. Great Simple tutorial on using Docker and Flask: Short and sweet. Docker for Beginners: Verbose and lengthy. Excellent introduction. Walks you through all the jargon.
Continue reading

Here are some notes on machine learning models. Concepts Behind Decision Trees Bagging (boostrap aggregation): Randomly sample with replacement, and average the results. Majority vote: The most commonly-occuring prediction. Internal node: Where the splits occur. Branches: Segments that connect the nodes. Terminal node (leafs, regions): Where the observations end up. The average of the responses (or majority vote) is the prediction for future observations. Gini index: where (m) is the leaf and (k) is the class (0 or 1 for binary classification, but can be extedned for multiple classes).
Continue reading

byu football

Goal of this post. Answer some interesting questions about BYU football. Dive into different modeling approaches. I don’t explain my thinking below, but some of the charts might be cool. Some questions of interest How is Kilani Sitake doing in his second season compared to past BYU coaches? More challenging: how’s he doing relative to all second-season coaches? df_in <- read.csv(file.path(fp_data, 'byu_seasons.csv')) %>% distinct() Some basic questions: * How many years do we have data on?
Continue reading

The goal of this tutorial is to do the following: Collect addresses (via Google Forms) Download to R (via googlesheets) Geocode them (via geocode) Plot them (using leaflet) Get driving distance between them (via gmapsdistance) Cluster them (kmeans) Making the leaflet plot fancy 1. Collect Perhaps in a future post I’ll explore googleformr. For now, I create forms the old-school way.
Continue reading

The beauty of open source is “Oh, let me just download that package and I can do amazing things!”. The reality is “ok, I downloaded it, and I got the ‘hello world’ example working. But now to actually get it to do what I want in the environment that I want takes like…now 30 hours? Just one more bug and I’ll finally give up…” Bugs I hit: I hit a lot of bugs when building my Leaflet tutorial.
Continue reading

Last March, I wanted to break into the Data Science community but was struggling with confidence and starting to doubt I’d make it. I had been reading blog after blog about data science, practiced additional coding and learning new methods at night. And rejection after rejection, I wondered if I was on the right path. Now on the other side, hindsight is 20⁄20. I write this post to my past self, a person seeking for a job they couldn’t seem to get.
Continue reading

Author's picture

Bryan Whiting

father, innovator, data scientist

Data Scientist

Washington, D.C.