Chapter 6 Practice!

6.1 Final note

Congratulations! You have reached the end of the tutor-led training session. You are welcome to continue your self-directed learning by completing the following two 30-minute exercises!

Remember to complete the feedback form (it’s all anonymous) to help us improve your learning journey to coding at DfT.

As you continue your R journey, it is important to adopt good coding practices to write clean, maintainable, and efficient code. Here are some key tips to help you on your way:

  • use consistent naming conventions, opt for clear, descriptive names in lowercase, separated by underscores (e.g. my_data()), and avoid using special characters or spaces
  • write readable code, add meaningful comments to explain your logic, and use proper spacing or indentation to improve clarity for yourself and others
  • reference packages explicitly, use the "package_name"::"function_name()" format to clearly indicate which package a function comes from. This avoids conflicts between similarly named functions in different packages
  • organise your script, break your code into sections with meaningful headings and keep related code together for easier navigation
  • version control, use tools like Git to track changes in your code, collaborate with others, and maintain a history of your work
  • continue learning, see below of a few useful DfT resources on coding in R

Here are a few useful DfT resources:

Our intermediate R courses:

6.2 Exercise

30:00

You have been tasked to choose the optimum Starbucks drink for your 09:00 meeting.

  1. Read in the Starbucks data from: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-12-21/starbucks.csv. This dataset includes serving information for all regular menu Starbucks items, such as serving size, whether it contains milk or whipped cream, and nutritional information.
  2. You don’t like whipped cream. Filter the dataset to remove any drinks with a non-zero amount of whipped cream.
  3. Also remove any drinks with a zero-sized serving.
  4. To make it through your meeting, you’re interested in the drink with the most sugar and caffeine per ml of drink. Create two new columns using mutate, calculating sugar per ml and caffeine per ml.
  5. Order the dataset by caffeine per ml and then sugar per ml, in order from highest to lowest.
  6. Check out the dataset to see what your drink recommendation is!
6.2. Solution


#1. Read in the starbucks data
starbucks <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-12-21/starbucks.csv")

#2. Filter the dataset to remove drinks
starbucks_filtered <- starbucks %>%
  filter(whip == 0, serv_size_m_l > 0)

#3. create two new columns: sugar and caffeine per ml
starbucks_mutate <- starbucks_filtered %>%
  mutate(sugar_per_ml = sugar_g / serv_size_m_l,
         caffeine_per_ml = caffeine_mg / serv_size_m_l)

#4. Order data by caffeine per ml and sugar per ml, from high to low
starbucks_ordered <- starbucks_mutate %>%
  arrange(desc(caffeine_per_ml), desc(sugar_per_ml))

#5. check out the top drink reccommendation
top_starbucks_drink <- starbucks_ordered %>%
  #Get the first row which has the highest caffeine and sugar per ml
  slice(1)
# select 'product_name' column to find out top drink
top_starbucks_drink %>%
  select(product_name)
##                        product_name
## 1 Clover Brewed Coffee - Dark Roast


6.3 Exercise

You need to pick the Spice Girls album with the most appropriate dance party energy for your upcoming 90s-themed birthday party.

30:00

  1. Read in the Spice Girls data from: “https://github.com/jacquietran/spice_girls_data/raw/main/data/studio_album_tracks.csv”. This dataset includes variables relating to Spice Girls album tracks, including the mood and musical properties of the tracks.
  2. You want to make sure that the album you choose has a good mixture of dance party energy. Group the data by album_name, and album_release_year. Summarise the data by album to produce mean values for the danceability, energy, and tempo.
  3. Pivot the data to gather the three track property columns into a single column.
  4. Plot a bar chart in ggplot with album name on the x axis and value of the property column on the y axis.
  5. Map the fill of the bars to the track property.
  6. Facet the chart by track property to produce 3 different charts for the different values.
  7. Choose which album you’ll be playing first at the party!
6.3. Solution


#1. Read in the data
spice_girls <- read.csv("https://github.com/jacquietran/spice_girls_data/raw/main/data/studio_album_tracks.csv")

#2. Group by album and release year, work out means for danceability, energy and tempo 
spice_summary <- spice_girls %>%
  group_by(album_name, album_release_year) %>%
  summarise(mean_danceability = mean(danceability, na.rm = TRUE),
            mean_energy = mean(energy, na.rm = TRUE),
            mean_tempo = mean(tempo, na.rm = TRUE))

#3. Pivot the data to gather the three track property cols into one col
spice_long <- spice_summary %>%
  pivot_longer(cols = starts_with("mean_"), 
               names_to = "property", 
               values_to = "value")

#4. plot a bar chart: x = album_name, y = "value"
#5. map the fill of the bars to the track property
ggplot(spice_long, mapping = aes(x = album_name, y = value, fill = property)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Mean Track Properties by Album",
       x = "Album Name",
       y = "Mean Value") +
  theme_minimal() +
  #6. add facet_wrap() to chart by track property
  facet_wrap(~ property, scales = "free_y")