Key Concepts

Review core concepts you need to learn to master this subject

dplyr inner_join()

R data frame objects can be joined together with the dplyr function inner_join(). Corresponding rows with a matching column value in each data frame are combined into one row of a new data frame, and non-matching rows are dropped.

dplyr‘s inner_join() takes two data frames as arguments and returns a new data frame with the corresponding rows merged together. Non-matching rows from each data frame are dropped in the resulting data frame.

For example, consider the sales and targets data frames of a t-shirt company. sales contains the monthly revenue for the company and has two columns: month and revenue. targets contains the goals for monthly revenue for each month and has two columns: month and target. To perform an inner join on the two data frames using dplyr:

sales_vs_targets <- sales %>% inner_join(targets)

inner_join() will use the month column as the column to match on, as both the sales and target data frames have a month column. The resultant data frame will only contain the matching rows from sales and targets.

Multiple data frames can be merged together at once by stringing multiple calls to inner_join with the pipe %>%.

For example, consider the same sales and targets data frames of a t-shirt company. An additional data frame small_medium_large contains the number of small, medium and large t-shirts sold per month and has four columns: month, small, medium, and large. To perform an inner join on the three data frames using dplyr:

sales_vs_targets <- sales %>% inner_join(targets) %>% inner_join(small_medium_large)

inner_join() will use the month column as the column to match on, as the sales, target, and small_medium_large data frames have a month column. The resultant data frame will only contain the matching rows from sales, targets, and small_medium_large.

Joining Tables in R
Lesson 1 of 1
  1. 1
    In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we own an e-commerce business and we want to track the products that have…
  2. 2
    Suppose we have the following three tables that describe our eCommerce business: - orders — a table with information on each transaction: order_id customer_id product_id quantity …
  3. 3
    It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Luckily, dplyr can efficiently do this for the entire table using the inner_join() method. The **inner_jo…
  4. 4
    In addition to using inner_join() to join two data frames together, we can use the pipe %>% to join multiple data frames together at once. The following command would join orders with customers, an…
  5. 5
    In the previous example, the inner_join() function “knew” how to combine tables based on the columns that were the same between two tables. For instance, orders and customers both had a column cal…
  6. 6
    In the previous exercise, we learned how to use rename() to join two data frames whose columns don’t match. A better option, however, exists. We can add the by argument when calling inner_join() …
  7. 7
    In our previous examples, there were always matching values when we were performing our joins. What happens when that isn’t true? Let’s imagine that our products table is out of date and is missi…
  8. 8
    In the previous exercise, we saw that when we join two data frames whose rows don’t match perfectly, we lose the unmatched rows. This type of join (where we only include matching rows) is called a…
  9. 9
    Let’s return to the join of Company A and Company B. — ### Left Join Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have e…
  10. 10
    Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single data frame fr…
  11. 11
    This lesson introduced some methods for combining multiple data frames: * Creating a data frame made by matching the common columns of two data frames is called a join * We can specify which column…

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo