Installing Python for Data Analysis
In this article, you'll learn about Anaconda, a Python distribution used for data analysis. By the end of the article, you will know how to install Anaconda and use IPython, an interactive Python shell for computing.
Background
Codecademy’s learning environment allows you to enter Python code and receive feedback on whether or not the code you entered is correct for a given exercise. In this article, we’ll walk you through how to install Python for data analysis so that you can write and run Python code outside of Codecademy and on your computer!
If your needs are different and you want to use Python for general programming, we recommend the following Codecademy resource:
Why build outside of Codecademy?
The programming world is massive, and it’s impossible to teach everything in one place. Although Codecademy excels at teaching you how to code via interactive lessons, we’d also like for you to learn how to code on your computer so that you can create personal projects (and perhaps share them with the world)!
In this article, we’ll cover the following topics:
- Which Version of Python Should I Install?
- Do I Need to Install Python?
- Python for Data Analysis
- What is Anaconda?
- What is Miniconda?
- Should I Install Anaconda or Miniconda?
- Installation: Anaconda
- Installation: Miniconda
- Was the Installation Successful?
- Managing Packages in Anaconda / Miniconda
pip
vs.conda
- Required Data Science Packages
Which Version of Python Should I Install?
Today, the debate rages on over which version of Python to use. Version 2.7, released in 2010, was perhaps the most widely used of all Python versions. Version 2.7, however, is not the most recent, nor the most popular anymore. In 2008, Python 3.0 — often stylized as Python “3.x” to represent all incremental updates to 3.0 — was released. As of this writing, the most recent version is Python 3.6.
There are some fundamental differences between Python 2.x and Python 3.x. You can read about these differences at the following resource:
This article will discuss how to install Python version 3.
Python for Data Analysis
Python is well-regarded for its readability and ease of use for relatively simple scripts and full applications. It’s capable of a lot more, however. Python is also perfect for large-scale data processing, analytics, and computing.
Anaconda is a Python distribution (a collection of specific software components) that provides you with Python and other essential data analysis tools. The rest of this article will explain what Anaconda is and how to install Anaconda (and different versions of it).
What is Anaconda?
Anaconda is an open-source Python distribution for large-scale data analytics (provided by Continuum Analytics, Inc.). It provides you with many of the tools you need to analyze large sets of data. When installed, Anaconda includes:
- The core Python language (you can use which version)
- Over 1000 data science packages
- Package management with
conda
- IPython
- Much more
What is Miniconda?
Miniconda is a slimmed-down version of Anaconda. The Anaconda download is large (a few gigabytes) and can take quite some time to download and install. Miniconda, on the other hand, is a smaller alternative. It includes only the basic requirements and allows you to install data science packages as-needed, thereby decreasing the size and time of the download.
Should I Install Anaconda or Miniconda?
Installing Anaconda vs. Miniconda is ultimately your choice. We recommend installing Miniconda to decrease the amount of time required to set up everything. The rest of this article will explain how to install Miniconda.
Installation: Miniconda
This video details how to download and install Miniconda.
To install Miniconda, follow these steps:
Navigate to the Miniconda download page: Miniconda
Select the Python 3.6 installer for your computer’s operating system.
Locate the installer that you downloaded using Explorer (Windows) or Finder (Mac OS).
Run the installer. Use the following instructions based on your computer’s operating system:
Mac OS:
You may receive a notification about XCode requiring additional component. Click “Install” and enter your password to proceed.
Open your terminal and navigate to the folder where you downloaded the installer. Type the following command in the terminal and press “Return” on your keyboard:
bash miniconda-filename.sh
miniconda-filename.sh
is a fictional file name in the example above. Your file name will look something like Miniconda3-latest-MacOSX-x86_64.sh
.
3. Follow all instructions in the terminal (you can press Enter
as-needed and type yes
when necessary).
Windows:
- Follow the installation instructions provided by the installer.
Was the Installation Successful?
To test whether your installation was successful (regardless of your computer’s operating system), type the following command into your terminal:
conda list
You should see a list of all the packages that Miniconda installed. If you’re on a computer that uses Windows, you may have to first navigate to the folder where you installed Miniconda for the conda list
command to function properly.
Congrats! You now have Miniconda (with Python 3.6) installed on your computer, and you are ready for some data science!
Managing Packages in Anaconda / Miniconda
With Python, you can build just about anything, from simple scripts to full applications. The Python language, however, doesn’t come pre-installed with all of the fancy features you might want (or require), even when installed using Anaconda or Miniconda. When you need particular functionality, you can look toward Python packages. A package structures Python modules, which contain pre-written code that other developers have created for you. Modules are handy when you are looking for specific functionality.
Usually, pip3
is used to install and manage Python 3 packages. It is the package manager for the official Python distribution. If you installed Python with Anaconda or Miniconda, however, the package manager is not pip3
, the package manager is conda
.
Just like pip3
, you can use conda
to install packages, like so:
conda install scipy
In the example above, conda
will install the SciPy package, a popular package (among many) used for mathematics with Python.
To learn more about conda
, visit the Conda documentation at the following link:
pip vs. conda
Although conda
is the package manager for Anaconda (and Miniconda), pip3
is also included with Anaconda (and Miniconda). Certain packages will not be available from conda
or Anaconda.org. When this happens, you can use pip3
to install packages.
Be careful when using pip3
, though. Using pip3
to install data science packages available to conda
can result in installation errors. In the next section, we’ll walk you through which data science packages you should install using conda
.
Required Data Science Packages
To make the most of Anaconda / Miniconda, you’ll need the following data science package. Use conda install
to install them.
numpy
scipy
matplotlib
statsmodels
pandas
seaborn
For example:
conda install numpy
Codecademy courses make use of the packages listed above.
Conclusion
So far, you’ve been writing Python code on Codecademy. Your learning journey, however, is not complete unless you can also write Python code outside of Codecademy, on your computer. If you’d like to analyze large sets of data with Python, we recommend installing Miniconda (with Python 3.6), and then using conda
to install certain data science packages. Have fun analyzing data!