Chapter 6 Practice!

6.1 Exercise 1: You need to choose the optimum Starbucks drink for your 9am meeting

30:00

  1. Read in the Starbucks data from: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-12-21/starbucks.csv. This dataset includes serving information for all regular menu Starbucks items, such as serving size, whether it contains milk or whipped cream, and nutritional information.
  2. You don’t like whipped cream. Filter the dataset to remove any drinks with a non-zero amount of whipped cream.
  3. Also remove any drinks with a zero-sized serving.
  4. To make it through your meeting, you’re interested in the drink with the most sugar and caffeine per ml of drink. Create two new columns using mutate, calculating sugar per ml and caffeine per ml.
  5. Order the dataset by caffeine per ml and then sugar per ml, in order from highest to lowest.
  6. Check out the dataset to see what your drink recommendation is!
6.1. Solution


#1. Read in the starbucks data
starbucks <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-12-21/starbucks.csv")

#2. Filter the dataset to remove drinks
starbucks_filtered <- starbucks %>%
  filter(whip == 0, serv_size_m_l > 0)

#3. create two new columns: sugar and caffeine per ml
starbucks_mutate <- starbucks_filtered %>%
  mutate(sugar_per_ml = sugar_g / serv_size_m_l,
         caffeine_per_ml = caffeine_mg / serv_size_m_l)

#4. Order data by caffeine per ml and sugar per ml, from high to low
starbucks_ordered <- starbucks_mutate %>%
  arrange(desc(caffeine_per_ml), desc(sugar_per_ml))

#5. check out the top drink reccommendation
top_starbucks_drink <- starbucks_ordered %>%
  #Get the first row which has the highest caffeine and sugar per ml
  slice(1)
# select 'product_name' column to find out top drink
top_starbucks_drink %>%
  select(product_name)
##                        product_name
## 1 Clover Brewed Coffee - Dark Roast


6.2 Exercise 2: You need to pick the Spice Girls album with the most appropriate dance party energy for your upcoming 90s-themed birthday party.

30:00

  1. Read in the Spice Girls data from: “https://github.com/jacquietran/spice_girls_data/raw/main/data/studio_album_tracks.csv”. This dataset includes variables relating to Spice Girls album tracks, including the mood and musical properties of the tracks.
  2. You want to make sure that the album you choose has a good mixture of dance party energy. Group the data by album_name, and album_release_year. Summarise the data by album to produce mean values for the danceability, energy, and tempo.
  3. Pivot the data to gather the three track property columns into a single column.
  4. Plot a bar chart in ggplot with album name on the x axis and value of the property column on the y axis.
  5. Map the fill of the bars to the track property.
  6. Facet the chart by track property to produce 3 different charts for the different values.
  7. Choose which album you’ll be playing first at the party!
6.2. Solution


#1. Read in the data
spice_girls <- read.csv("https://github.com/jacquietran/spice_girls_data/raw/main/data/studio_album_tracks.csv")

#2. Group by album and release year, work out means for danceability, energy and tempo 
spice_summary <- spice_girls %>%
  group_by(album_name, album_release_year) %>%
  summarise(mean_danceability = mean(danceability, na.rm = TRUE),
            mean_energy = mean(energy, na.rm = TRUE),
            mean_tempo = mean(tempo, na.rm = TRUE))

#3. Pivot the data to gather the three track property cols into one col
spice_long <- spice_summary %>%
  pivot_longer(cols = starts_with("mean_"), 
               names_to = "property", 
               values_to = "value")

#4. plot a bar chart: x = album_name, y = "value"
#5. map the fill of the bars to the track property
ggplot(spice_long, mapping = aes(x = album_name, y = value, fill = property)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Mean Track Properties by Album",
       x = "Album Name",
       y = "Mean Value") +
  theme_minimal() +
  #6. add facet_wrap() to chart by track property
  facet_wrap(~ property, scales = "free_y")