Key Concepts

Review core concepts you need to learn to master this subject

Pandas

import pandas as pd

Pandas is an open source library that is used to analyze data in Python. It takes in data, like a CSV or SQL database, and creates an object with rows and columns called a data frame. Pandas is typically imported with the alias pd.

Creating, Loading, and Selecting Data with Pandas
Lesson 1 of 2
  1. 1
    Pandas is a Python module for working with tabular data (i.e., data in a table with rows and columns). Tabular data has a lot of the same functionality as SQL or Excel, but Pandas adds the power of…
  2. 2
    A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, …
  3. 3
    You can also add data using lists. For example, you can pass in a list of lists, where each one represents a row of data. Use the keyword argument columns to pass a list of column names. df2 …
  4. 4
    We now know how to create our own DataFrame. However, most of the time, we’ll be working with datasets that already exist. One of the most common formats for big datasets is the CSV. *CSV (com…
  5. 5
    When you have data in a CSV, you can load it into a DataFrame in Pandas using .read_csv(): pd.read_csv(‘my-csv-file.csv’) In the example above, the .read_csv() method is called. The CSV file cal…
  6. 6
    When we load a new DataFrame from a CSV, we want to know what it looks like. If it’s a small DataFrame, you can display it by typing print(df). If it’s a larger DataFrame, it’s helpful to be able…
  7. 7
    Now we know how to create and load data. Let’s select parts of those datasets that are interesting or important to our analyses. Suppose you have the DataFrame called customers, which contains the…
  8. 8
    When you have a larger DataFrame, you might want to select just a few columns. For instance, let’s return to a DataFrame of orders from ShoeFly.com: |id|first_name|last_name|email|shoe_type|sh…
  9. 9
    Let’s revisit our orders from ShoeFly.com: |id|first_name|last_name|email|shoe_type|shoe_material|shoe_color| | — | — | — | — | — | — | — | |54791|Rebecca|Lindsay|RebeccaLindsay57…
  10. 10
    You can also select multiple rows from a DataFrame. Here are a few more rows from ShoeFly.com’s orders DataFrame: |id|first_name|last_name|email|shoe_type|shoe_material|shoe_color| |-|-|-|-|-|…
  11. 11
    You can select a subset of a DataFrame by using logical statements: df[df.MyColumnName == desired_column_value] We have a large DataFrame with information about our customers. A few of the many r…
  12. 12
    You can also combine multiple logical statements, as long as each statement is in parentheses. For instance, suppose we wanted to select all rows where the customer’s age was under 30 or the cus…
  13. 13
    Suppose we want to select the rows where the customer’s name is either “Martha Jones”, “Rose Tyler” or “Amy Pond”. |name|address|phone|age| |-|-|-|-| |Martha Jones|123 Main St.|234-567-8910|2…
  14. 14
    When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use .iloc(). We can fix this using the method .reset_index(). F…
  15. 15
    You’ve completed the lesson! You’ve just learned the basics of working with a single table in Pandas, including: - Create a table from scratch - Loading data from another file - Selecting certain …

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo