Tidy Tuesday Analysis - Hip-hop Ratings

Analysis of hip-hop song rating data from BBC Music

Shon Inouye

8 minute read

Introduction

This Tidy Tuesday analysis revolves around hip-hop data and song ratings by critics within the genre. From the Tidy Tuesday GitHub repo and related article:

Earlier this year, BBC Music asked more than 100 critics, artists, and other music industry folks from 15 countries for their five favorite hip-hop tracks. Then they broke down the results of the poll into one definitive list. But BBC Music didn’t just publish a best-of list, they also published the complete poll results and a description of the simple algorithm they ranked the songs with. - Simon Jockers

We awarded 10 points for first ranked track, eight points for second ranked track, and so on down to two points for fifth place. The song with the most points won. We split ties by the total number of votes: songs with more votes ranked higher. Any ties remaining after this were split by first place votes, followed by second place votes and so on: songs with more critics placing them at higher up the lists up ranked higher. – BBC Music

Questions

  • Which songs received the most points?
  • What is the distribution of nominated song release years and song points over time?
  • What is the distribution of artist genders?
    • How has the distribution of artist genders changed over time?
  • What is the distribution of critics’ roles?
  • What is the distribution of critics’ primary country?

Analysis

Let’s start by loading our packages and data. From now on, I’ll be using the convenient tidytuesdayR package to load weekly data (instead of reading it with the link to the raw GitHub files).

# Load packages
#devtools::install_github("thebioengineer/tidytuesdayR")
library(tidytuesdayR)
library(tidyverse)
library(RcppRoll)

# Load data
tt_data <- tt_load(2020, week = 16)
artist_polls <- tt_data$polls
artist_rankings <- tt_data$rankings

While performing my analysis, I discovered an error in one of the observations of the song “All Of The Lights” by Kanye West ft. Rihanna & Kid Cudi.

# Show row errors
artist_polls %>% 
  filter(title == "All Of The Lights") %>% 
  head()
## # A tibble: 3 x 9
##    rank title artist gender  year critic_name critic_rols critic_country
##   <dbl> <chr> <chr>  <chr>  <dbl> <chr>       <chr>       <chr>         
## 1     1 All … Kanye… male    2010 Kish Lal    Critic      US            
## 2     2 All … Kanye… mixed   2010 Dan Runcie  Trapital    US            
## 3     2 All … Kanye… mixed   2010 Jan Wehn    Critic      Germany       
## # … with 1 more variable: critic_country2 <chr>
# Show row errors
artist_rankings %>% 
  filter(title == "All Of The Lights") %>% 
  head()
## # A tibble: 2 x 12
##      ID title   artist    year gender points     n    n1    n2    n3    n4    n5
##   <dbl> <chr>   <chr>    <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1    43 All Of… Kanye W…  2010 mixed      16     2     0     2     0     0     0
## 2    70 All Of… Kanye W…  2010 male       10     1     1     0     0     0     0

The gender of the artists should be mixed so I edited the labels and point/vote calculations in the polls and rakings datasets.

# Fix error in artist_polls
error_polls <- which(artist_polls$title == "All Of The Lights" & artist_polls$gender == "male")
artist_polls$gender[error_polls] <- "mixed"

# Fix error in artist_rankings
# Get indices of correct and error rows
error_rankings <- which(artist_rankings$title == "All Of The Lights" & artist_rankings$gender == "male")
correct_rankings <- which(artist_rankings$title == "All Of The Lights" & artist_rankings$gender == "mixed")

# Add points/votes
artist_rankings[correct_rankings, 6:12] <- artist_rankings[correct_rankings, 6:12] + artist_rankings[error_rankings, 6:12]

# Remove error row
artist_rankings <- artist_rankings[-c(error_rankings), ]

Which songs received the most points?

Starting off simple, let’s take a look at the songs that received the most points. Each of the 300+ songs received at least 2 points, so we’ll limit the plot to the top ten.

# Top 10 songs 
artist_rankings %>% 
  top_n(10, points) %>% 
  mutate(title_artist = paste0(title, "\nby ", artist),
         title_artist = fct_reorder(title_artist, points)) %>% 
  ggplot(aes(x = title_artist, y = points)) + 
  geom_col(fill = "steelblue") + 
  coord_flip() + 
  theme_minimal() + 
  theme(axis.text.y = element_text(hjust = 0)) +
  labs(title = "Which songs received the most points?",
       x = "",
       y = "Total points")

What is the distribution of nominated song releases and song points over time?

Hip-hop has a very interesting history and the next thing that I wanted to look at was the how the release years and total points of these 300+ nominated songs were distributed over time.

# Number of nominated songs released per year
artist_rankings %>% 
  count(year) %>% 
  ggplot(aes(x = year, y = n)) + 
  geom_col(fill = "steelblue") + 
  theme_minimal() + 
  labs(title = "Number of nominated songs released per year",
       x = "Year",
       y = "Number of songs")

# Total number of points awarded per year
artist_rankings %>% 
  group_by(year) %>% 
  summarize(yearly_points = sum(points)) %>% 
  ggplot(aes(x = year, y = yearly_points)) + 
  geom_col(fill = "steelblue") + 
  theme_minimal() + 
  labs(title = "Number of song points awarded by release year",
       x = "Year",
       y = "Points")

Both plots show a significant increase in the 90s, which is consistent with the genre becoming a best-selling music genre in the mid-1990s and the top selling music genre by 1999.

What is the distribution of artist genders?

The next thing I wanted to look at was the gender distribution of artists in the list of nominated songs.

# Artist gender distribution
artist_rankings %>% 
  count(gender)
## # A tibble: 3 x 2
##   gender     n
##   <chr>  <int>
## 1 female    23
## 2 male     267
## 3 mixed     20

With 267 of the 300+ nominated songs having male artists, we can definitely notice how male-dominated the space is.

It is important to note that the numbers above are counting duplicates of artists who produced multiple songs in the list of 300+ nominated songs.

For unique artists, we can do this:

# Artist gender distribution, no duplicates
artist_rankings %>% 
  distinct(artist, .keep_all = TRUE) %>% 
  count(gender)
## # A tibble: 3 x 2
##   gender     n
##   <chr>  <int>
## 1 female    18
## 2 male     169
## 3 mixed     20

However this still doesn’t account for artists collaborating with different combinations of other artists. We could extract artists featured in songs (by extracting names separated by “ft.”, “feat”, “and”, and “&”), but not all are hip-hop artists and it would require additional work to determine the gender of the artists from gender-mixed collaborative songs.

How has the distribution of artist genders changed over time?

Given the caveat of duplicate artists mentioned above, here is what the distribution of artists genders looks like for the nominated songs over time:

# Distribution of artist genders over time
artist_rankings %>% 
  group_by(year, gender) %>% 
  count(gender) %>% 
  ggplot(aes(x = year, y = n, fill = gender)) +
  geom_col() + 
  facet_wrap(~gender) +
  theme(legend.position = "none") +
  labs(title = "Number of songs per artist gender over time",
       x = "Year",
       y = "Number of songs")

Visualizing points, gender, and time together

Now that we’ve taken a look at points, gender, and time separately, let’s view how they all interact together.

# Points over time by gender
artist_rankings %>% 
  ggplot(aes(x = year, y = points, color = gender, label = title)) +
  geom_jitter(alpha = 0.6) + 
  geom_text(aes(label = ifelse(points >= 50, title, "")), 
            color = "black", 
            hjust = -0.1, 
            vjust = 0) +
  scale_color_brewer(palette="Set1") +
  theme_minimal() +
  labs(title = "",
       x = "Year",
       y = "Total points",
       color = "")

What is the distribution of critics’ roles?

While the votes/points for these songs were done by critics, industry figures, artists, academics in the field of hip-hop and music, it is important to keep in mind that music is a subjective field. We can take a quick look at some of the top roles of those critics:

# Top critic roles
artist_polls %>% 
  distinct(critic_name, .keep_all = TRUE) %>% 
  mutate(role_lump = fct_lump(critic_rols, 6)) %>% 
  count(role_lump, sort = TRUE) %>% 
  mutate(role_lump = fct_reorder(role_lump, n)) %>% 
  ggplot(aes(x = role_lump, y = n)) +
  geom_col(fill = "steelblue") + 
  coord_flip() + 
  theme_minimal() + 
  labs(title = "What is the distribution of critics' roles?",
       x = "Critic's role",
       y = "Number of critics")

While there does exist a large range of unique companies and roles, the label “Critic” is the only consistently specific role, which isn’t quite descriptive enough to draw further insight from. The rest of the roles in “Other” are unique enough (each only existing once) that they are not able to be generalized further without manual classification.

What is the distribution of critics’ primary country?

The last thing I wanted to investigate was the countries from which the critics were taken from.

# Plot critic primary countries
artist_polls %>% 
  distinct(critic_name, .keep_all = TRUE) %>% 
  count(critic_country) %>% 
  mutate(critic_country = reorder(critic_country, n)) %>% 
  ggplot(aes(x = critic_country, y = n)) +
  geom_col(fill = "steelblue") + 
  coord_flip() + 
  theme_minimal() + 
  labs(title = "What is the distribution of critics' primary country?",
       x = "Critic's country",
       y = "Number of critics")

Given the fact that hip-hop originated and developed in the United States, it is not surprising that the majority of the critics polled were from there. This does give a bit more context to the statement “We polled 108 critics from 15 countries across six continents” by the BBC. It is interesting to see critics from countries that I hadn’t expected, such as China, Japan, India, and Russia.

Check out the code for this project on GitHub!

comments powered by Disqus