Learn
Aggregates in R
Combining Grouping with Mutate

group_by() can also be used with the dplyr function mutate() to add columns to a data frame that involve per-group metrics.

Consider the same educational technology company’s enrollments table from the previous exercise:

user_id course quiz_score
1234 learn_r 80
1234 learn_python 95
4567 learn_r 90
4567 learn_python 55

You want to add a new column to the data frame that stores the difference between a row’s quiz_score and the average quiz_score for that row’s course. To add the column:

enrollments %>% group_by(course) %>% mutate(diff_from_course_mean = quiz_score - mean(quiz_score))
  • group_by() groups the data frame by course into two groups: learn-r and learn-python
  • mutate() will add a new column diff_from_course_mean which is calculated as the difference between a row’s individual quiz_score and the mean(quiz_score) for that row’s group (course)

The resulting data frame would look like this:

user_id course quiz_score diff_from_course_mean
1234 learn_r 80 -5
1234 learn_python 95 20
4567 learn_r 90 5
4567 learn_python 55 -20
  • The average quiz_score for the learn-r course is 85, so diff_from_course_mean is calculated as quiz_score - 85 for all the rows of enrollments with a value of learn-r in the course column.
  • The average quiz_score for the learn-python course is 75, so diff_from_course_mean is calculated as quiz_score - 75 for all the rows of enrollments with a value of learn-python in the course column.

Instructions

1.

You want to be able to tell how expensive each order is compared to the average price of orders with the same shoe_type.

Group orders by shoe_type and create a new column named diff_from_shoe_type_mean that stores the difference in price between an orders price and the average price of orders with the same shoe_type. Save the result to diff_from_mean, and view it.

Don’t forget to include na.rm = TRUE as an argument in the summary function you call!

Folder Icon

Take this course for free

Already have an account?