Multiple Tables in Pandas
Learn how to combine information from multiple DataFrames.
StartKey Concepts
Review core concepts you need to learn to master this subject
Efficient Data Storage with Multiple Tables
Efficient Data Storage with Multiple Tables
For efficient data storage, related information is often spread across multiple tables of a database.
Consider an e-commerce business that tracks the products that have been ordered from its website. Business data for the company could be split into three tables:
orders
would contain the information necessary to describe an order:order_id
,customer_id
,product_id
,quantity
, andtimestamp
products
would contain the information to describe each product:product_id
,product_description
andproduct_price
customers
would contain the information for each customer:customer_id
,customer_name
,customer_address
, andcustomer_phone_number
This table structure prevents the storage of redundant information, given that each customer’s and product’s information is only stored once, rather than each time a customer places an order for another item.
- 1In order to efficiently store data, we often spread related information across multiple tables. For instance, imagine that we own an e-commerce business and we want to track the products that have…
- 2Suppose we have the following three tables that describe our eCommerce business: - orders — a table with information on each transaction: order_id customer_id product_id quantity …
- 3It is easy to do this kind of matching for one row, but hard to do it for multiple rows. Luckily, Pandas can efficiently do this for the entire table. We use the .merge method. The *.merge…
- 4In addition to using pd.merge, each DataFrame has its own merge method. For instance, if you wanted to merge orders with customers, you could use: new_df = orders.merge(customers) This produces t…
- 5In the previous example, the merge function “knew” how to combine tables based on the columns that were the same between two tables. For instance, products and orders both had a column called prod…
- 6In the previous exercise, we learned how to use rename to merge two DataFrames whose columns don’t match. If we don’t want to do that, we have another option. We could use the keywords left_on an…
- 7In our previous examples, there were always matching values when we were performing our merges. What happens when that isn’t true? Let’s imagine that our products table is out of date and is miss…
- 8In the previous exercise, we saw that when we merge two DataFrames whose rows don’t match perfectly, we lose the unmatched rows. This type of merge (where we only include matching rows) is called …
- 9Let’s return to the merge of Company A and Company B. — ### Left Merge Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have…
- 10Sometimes, a dataset is broken into multiple tables. For instance, data is often split into multiple CSV files so that each download is smaller. When we need to reconstruct a single DataFrame fro…
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory