I’m going to be a dad (again) soon. I don’t want my boy to have a common name, so I gotta do my research. I’m curious: how do names trend over time? If I pick a name today, will it be popular tomorrow?

Step 1: The Social Security Administration reports all baby names each year in the United States, given the name occurs at least 5 times. Thankfully, I can download these easily from here. Step 2: Combine the data across years.

Let’s dive into the data and find the top baby three names each year for each sex.

load("~/github/blog/static/img/20190909/names.Rda")
# Add 
df2 <- df_all %>%
  group_by(year, sex) %>%
  arrange(year, sex, desc(number)) %>%
  mutate(pct = number/sum(number) * 100,
       cum_pct = cumsum(pct),
       tenk = pct/100 * 10000,
       rank = rank(desc(number), ties.method='first')) %>%
  ungroup()

# Look at the top 3 in each year
df2 %>% 
  group_by(sex, year) %>% 
  top_n(desc(3)) %>% 
  select(-tenk, -cum_pct) %>%
  mutate(pct = round(pct, 2)) %>%
  arrange(desc(year)) %>% #names()
  head(20) %>%
  knitr::kable(col.names =c("Name", "Sex", "Number Babies", "Year", "% of Pop.", "Rank" ))
Name Sex Number Babies Year % of Pop. Rank
Emma F 18688 2018 1.11 1
Olivia F 17921 2018 1.06 2
Ava F 14924 2018 0.88 3
Liam M 19837 2018 1.10 1
Noah M 18267 2018 1.01 2
William M 14516 2018 0.81 3
Emma F 19800 2017 1.15 1
Olivia F 18703 2017 1.09 2
Ava F 15958 2017 0.93 3
Liam M 18798 2017 1.02 1
Noah M 18410 2017 1.00 2
William M 14967 2017 0.81 3
Emma F 19496 2016 1.10 1
Olivia F 19365 2016 1.10 2
Ava F 16302 2016 0.92 3
Noah M 19117 2016 1.01 1
Liam M 18218 2016 0.96 2
William M 15761 2016 0.83 3
Emma F 20455 2015 1.15 1
Olivia F 19691 2015 1.11 2


That’s interesting - it makes me wonder: what of the total % of babies are taken up in the top 100 names?

df2 %>% filter(sex == 'M', year == 2018, rank < 100) %>%
  ggplot(aes(x=rank, y = cum_pct)) + 
  geom_point() + 
  ylim(c(0, 100)) + 
  labs(x = "Name Rank", y = "Cumulative % of All Names", 
       title = "The top 100 names account for > 40% of all names",
       caption = "Top 100 boy names, 2018") + 
  theme_minimal()

Interesting. If you name your kid somewhere in the top 100, according to the Birthday Paradox, you’re likely to have a pretty common name!

Makes me think - what’s the cumulative distribution of baby names?

df2 %>% 
  filter(sex == 'M', year == 2018, rank < 2251) %>%
  # View()
  ggplot(aes(x=rank, y = cum_pct)) + 
  geom_point() + 
  ylim(c(0, 100)) + 
  labs(x = "Name Rank", y = "Cumulative % of All Names", 
       title = "The top 500 boy names account for  75% of all names") + 
  scale_x_continuous(breaks = seq(0, 2250, by=250)) + 
  theme_minimal() + 
  theme(panel.grid.minor = element_blank())

This plot won’t really work on your phone, but lets you see the path of every top 50 name since 1980. Just hover your mouse, click on the name, and hold “shift” if you want to click multiple names:

tmp <- df2 %>%
  filter(rank < 50,
         year > 1980,
         sex == 'M') %>%
  mutate(rank = rank) %>%
  highlight_key(~name)#, group='Search Names (Select multiple holding "shift" + click)')

# Working ggplotly example: https://plotcon17.cpsievert.me/workshop/day2/#18
p <- ggplot(tmp, aes(year, rank, group=name)) + 
  geom_line() + 
  geom_point() + 
  labs(x = "Year", y="Name Rank", title="Top 50 Boy Names Since 1980\nHover your mouse to see trend") +
  scale_y_reverse() + 
  theme_minimal()
gg <- ggplotly(p, tooltip = c("name", "rank"))

highlight(gg, 'plotly_click') 
## Setting the `off` event (i.e., 'plotly_doubleclick') to match the `on` event (i.e., 'plotly_click'). You can change this default via the `highlight()` function.
# highlight(gg, 'plotly_click') 
# selectize = T, 
#           dynamic=T, 
#           #colors = c(rgb(225, 200, 100, maxColorValue = 255)),
#           defaultValues = c("Liam", "Noah", "William"))
all_yrs = df2 %>%
  filter(year >= 2010,
         sex == 'M') %>%
  mutate(year = str_replace(paste0('y', year), '20','')) %>%
  select(name, year, rank) %>%
  spread(year, rank) %>%
  arrange(y18) 

rank2018 <- df2 %>% 
  filter(year == 2018, sex=='M') %>% 
  ungroup() %>% 
  select(rank, pct, number) 

# How "off" are the aggregate numbers here? Should we inflate?
# Compare the total of our names to the CDC data
# df2 %>% 
#   filter(year == 2017) %>%
#   summarize(sum(number))
# 3,561,975
# CDC reports: https://www.cdc.gov/nchs/fastats/births.htm
# 2017 actual: 3,855,500
# 3561975/3855500 = 92%
# so let's assume that our numbers are off by 8%. Just divide the number by .9238

# number of boys in 2018
# df2 %>%
#   filter(year == 2018, sex == 'F') %>%
#   summarise(sum(number))
# 1,800,392
# 1686961

name_rank <- all_yrs %>% 
  left_join(rank2018, by=c("y18" = "rank")) %>%
  mutate(number = floor(number/.9238)) %>%
  select(name, number, pct, paste0('y', 18:10)) 

name_rank %>% head(30)
## # A tibble: 30 x 12
##    name  number   pct   y18   y17   y16   y15   y14   y13   y12   y11   y10
##    <fct>  <dbl> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1 Liam   21473 1.10      1     1     2     2     2     3     6    15    30
##  2 Noah   19773 1.01      2     2     1     1     1     1     4     5     7
##  3 Will…  15713 0.806     3     3     3     5     5     5     5     3     5
##  4 James  14640 0.751     4     4     5     7     9    13    14    17    19
##  5 Oliv…  14493 0.744     5     9    12    19    32    52    73    78    88
##  6 Benj…  14484 0.743     6     6     6    10    12    14    16    19    22
##  7 Elij…  13948 0.716     7     8     9    11    11    11    13    13    18
##  8 Lucas  13623 0.699     8    11    14    16    19    23    27    29    35
##  9 Mason  13460 0.691     9     7     4     3     3     4     2     2    12
## 10 Logan  13370 0.686    10     5    18    14    13    18    22    21    17
## # … with 20 more rows
# name_rank %>%
#   View()

Dimensions of criteria around name: * Rarity: is your kid going to grow up with 3 other kids in the grade with the same name? * Phonetics: do you like the sound? You’ll be the one saying it 100x per day. * Symbolism: give a name symbolic of something else. For example, “Aonani” is a Hawaiian name meaning “beautiful light”/ * Season: related to when they were born, e.g., “summer”. Or if they are born on a full moon, name them luna. * Family heritage: touch base with your family roots and name your child after an ancestor. * Personal touch: you name them after someone you have a close connection to. * Historical: with some reference to a historical person, e.g., “Martin Luther King” named after “Martin Luther” * Biblical/Religious: a name common in your religion.

Pronunciation Different Spellings