Did Disney Princesses Distort Girl Names in USA?

My goal is to analyse how frequency of names found among Disney female characters changed over time in the US. Specifically, I want to see if the movie release had any impact on their popularity. For this purpose, I will use babynames dataset that is available on CRAN.

The idea for the exercise was inspired by Sean Kross’ blog post

Short description of the dataset

from CRAN package description

The SSA baby names data comes from social security number (SSN) applications. SSA cards were first issued in 1936, but were only needed for people with an income. In 1986, the law changed effectively requiring all children to get an SSN at birth.

The dataset is quite simple, covering US baby name records from late 1800’s until 2014. It specifies whether a name is male or female, number of respective names in a given year and what proportion they constituted.

library(babynames)
baby <- babynames
baby$sex=as.factor(baby$sex)
summary(baby)
##       year      sex             name                 n          
##  Min.   :1880   F:1100858   Length:1858689     Min.   :    5.0  
##  1st Qu.:1950   M: 757831   Class :character   1st Qu.:    7.0  
##  Median :1983               Mode  :character   Median :   12.0  
##  Mean   :1973                                  Mean   :  183.4  
##  3rd Qu.:2002                                  3rd Qu.:   32.0  
##  Max.   :2015                                  Max.   :99680.0  
##       prop          
##  Min.   :2.260e-06  
##  1st Qu.:3.900e-06  
##  Median :7.350e-06  
##  Mean   :1.391e-04  
##  3rd Qu.:2.324e-05  
##  Max.   :8.155e-02

Installing packages

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4

Quick data pre-prep

I assign each name to a separate dataframe.

ariel <- baby %>%
  filter(name == "Ariel", sex == "F")
## Warning: package 'bindrcpp' was built under R version 3.4.4
belle <- baby %>%
  filter(name == "Belle", sex == "F")

jasmine <- baby %>%
  filter(name == "Jasmine", sex == "F")

tiana <- baby %>%
  filter(name == "Tiana", sex == "F")

merida <- baby %>%
  filter(name == "Merida", sex == "F")

elsa <- baby %>%
  filter(name == "Elsa", sex == "F")

Next, I create variables specifying the release date of a movie with character’s name.

# The Little Mermaid
ariel_release = 1989

# Beauty and the Beast
belle_release = 1991

# Alladin
jasmine_release = 1992

# The Princess and the Frog
tiana_release = 2009

# Brave
merida_release = 2012

# Frozen
elsa_release = 2013

Plots

Finally, I plot the number of names for a given year. The arrows indicate when the movie was released, so that it’s easier to compare before and after trend. Additionally, I show the number of names and their proportion for a year proceeding and following the movie release. The numbers (and graphs!) say it all :-)

 
comments powered by Disqus