Key Concepts

Review core concepts you need to learn to master this subject

dplyr package

The dplyr package provides functions that perform data manipulation operations oriented to explore and manipulate datasets. At the most basic level, the package functions refers to data manipulation “verbs” such as select, filter, mutate, arrange, summarize among others that allow to chain multiple steps in a few lines of code. The dplyr package is suitable to work with a single dataset as well as to achieve complex results in large datasets.

data frame object

A data frame is an R object that store data in two dimensions represented by columns and rows. The columns are the different variables of the dataframe and the rows are the observations of each variable. Each row of the dataframe represent a unique set of observations. This object is a useful data structure to store data with different types in columns and perform analysis around them.

Excluding Columns with select() in Dplyr

The select() function of dplyr allows users to select all columns of the data frame except for the specified columns. To exclude columns, add the - operator before the name of the column or columns when passing them as an arguments to select(). This will return a new data frame with all columns except ones preceded by a - operator. For example: select(-genre, -spotify_monthly_listeners, -year_founded).

rename-dplyr

The rename() function of dplyr package can be used to change the column names of a data frame. It has a simple syntax where it is necessary to pass the new name followed by the = operator and the old name of the column. On the other hand to rename multiple columns based on logical criteria, the rename() function has variants such as rename_if(), rename_at() and rename_all().

Loading and Saving CSVs with R

The read_csv() and write_csv() functions belong to the tidyverse package and perform smart reading and writing operations of files in R. The read_csv() function reads a file and converts it to a better format of a data frame called a tibble. The first argument of the read_csv() is the file to be read. Tibbles in R can be exported to csv files using the write_csv() function. The first argument of write_csv() is the tibble to be exported.

filter with logical operators

The filter() function can subset rows of a data frame based on logical operations of certain columns. The condition of the filter should be explicity passed as a parameter of the function with the following syntax: name of the column, operator(<,==,>,!=) and value. On the other hand is possible to chain conditions within a column or on different columns using logical operators such as boolean operators(&,|,!).

Dplyr’s filter()

The filter() function of the dplyr package allows users to select a subset of rows in a data frame that match with certain conditions that are passed as arguments. The first argument of the function is the data frame and the following arguments are the conditional expressions that serve as the filter() criteria. For example: filter(artists, genre == 'Rock', spotify_monthly_listeners > 20000000).

data frames primary information

Data frames in R can be inspected using head() and summary(). The head() function accepts an integer argument which determines the number of rows of the data frame that you can see. The default value of the head() function is 6. The summary() returns summary statistics such as min, max, mean, and three quartiles.

dplyr arrange()

The arrange() function of dplyr package, order the rows of a dataframe based on the values of a column or a set of columns that are passed as parameters. The resulting order of the dataframe can be in ascending or descending order. By default arrange() order the dataframe in ascending order, but it is possible to change this and order the dataframe in descending order using the desc() parameter over the column.

mutate() dplyr

The mutate() function from dplyr package adds new columns to an existing data frame based on a transformation of an existing column, while maintaining all the other columns. The function receives the data frame as the first parameter, and subsequently specify the new column name followed by the = operator and a transformation function. After the first variable parameter, further parameters can be added to mutate more variables at the same time.

Comma Separated Values (CSV)

CSV (Comma-separated values) files represent plain text in the form of a spreadsheet that use comma to separate individual values. This type of file is easy to manage and compatible with many different platforms. This file can be imported to a database or to an Integrated Development Environment (IDE) to work with its content.

pipes

The pipe %>% can be used to input a value or an object into the first argument of a function. Instead of passing the argument into the function seperately, it is possible to write the value or object and then use the pipe to convert it as the function argument in the same line. This can be used with the functions select() and filter() that contain a data frame as the first argument.

In the example, the weather data frame is piped into the select function that would select the first two columns of the weather data frame.

Python Comments

A comment is a piece of text within a program that is NOT executed as part of the program. Comments can be used to provide additional information to aid in understanding the code. A comment can appear at the start of a line or at the end of a line after a piece of code. A comment is started by the # character and continues until the end of the line. It is NOT possible to include code after a comment is started.

Python Arithmetic Operations

Python supports different types of arithmetic operations. These operations can be performed on literal numbers, variables, or some combination. The primary arithmetic operators are:

  • + for addition
  • - for subtraction
  • / for division
  • * for multiplication

The operator is placed between two numbers, variables, or expressions, and the result of the operation is evaluated. The result can be assigned to a new variable or used as an argument to a function call.

Plus-Equals Operator

The plus-equals operator += provides a convenient way to add a value to an existing variable and assign the new value back to the same variable. In the case where the variable and the value are strings, this operator performs string concatenation instead of addition. The operation is performed in-place meaning that any other variable which points to the variable being updated will also be updated.

Python Variables

A variable is used to store data to be used by the program. This data can be a number, a string, a Boolean, a list or some other data type. Every variable has a name which can consist of letters, numbers, and the underscore character (_). No other type of characters can be used to create the variable name and the variable may NOT start with a number. The equal sign = is used to assign a value to a variable. That assignment can be from a fixed value or taken from another existing variable. It can also be used to change the value of a variable from one value to another after the initial assignment is made.

Python Integer Division

Python 3 will automatically convert integer numbers to floating-point before performing division. This behavior is changed from Python 2 where integer numbers were NOT automatically converted. In Python 2, use of the float() function or a literal floating point value was required to force division to produce a floating point result.

Modulo Operator

Python supports an operator to perform the modulo calculation. A modulo calculation returns the remainder of a division between the first and second number. For example, the result of the expression 4 % 2 would result in the value 0 because 4 is evenly divisible by 2 leaving no remainder. The result of the expression 7 % 3 would return 1 because 7 is NOT evenly divisible by 3 leaving a remainder.

Python Integers

Python variables can be assigned different types of data. One supported data type is the integer. An integer is a number which can be written without a fractional part (no decimal). An integer can be a positive number, a negative number or the number 0 so long as there is no decimal portion. The number 0 represents an integer value but the same number written as 0.0 would represent a floating point number.

Python String Concatenation

Python supports the joining (concatenation) of strings together using the + operator. The + operator is also used for mathematical addition operations. If the parameters passed to the + operator are strings, then concatenation will be performed. If one of the parameters is NOT a string, then Python will report an error condition. Multiple variables or literal strings can be joined together using the + operator. The concatenation process does not add any whitespace between the strings that are joined.

Error Notification

The Python interpreter will report errors present in your code. For most error cases, the interpreter will display the line of code and then immediately below the code, display a line with the caret character ^ under the portion of the code where the error was detected.

Division By Zero Error

A ZeroDivisionError is reported by the Python interpreter when it detects a division operation is being performed and the denominator (bottom number) is 0. In mathematics, dividing a number by zero has no defined value, so Python treats this as an error condition and will report a ZeroDivisionError and display the line of code where the division occurred. This does not only happen when a 0 is specifically written in the code. This can also happen if a variable is used as the denominator and its value has been set to or changed to 0. In general, it is good practice to test the value of a variable before attempting to divide another number or variable with it.

Python Strings

A string is a sequence of characters (letters, numbers, whitespace or punctuation) enclosed by quotation marks. A Python string can be enclosed using either the double quotation mark " or the single quotation mark '.

If a string has to be broken into multiple lines, the backslash character \ can be used to indicate that the string continues on the next line.

Python SyntaxError

A SyntaxError is reported by the Python interpreter when some portion of the code is incorrect. This can include misspelled keywords, missing or too many brackets or parenthesis, incorrect operators, missing or too many quotation marks, or other conditions. The Python interpreter will display the line of code where the SyntaxError was detected and place a caret character under the point in the line where the interpreter believes the error exists.

Exponentiation

In addition to the basic operations of addition, subtraction, multiplication and division, Python supports an operator for exponentiation. That operator is written with two asterisks like so **. The format for exponentiation in Python is a number or variable followed by the operator ** followed by a number or variable which represents the power to raise the number. Both the number and the power can be floating point values.

Python NameError

A NameError is reported by the Python interpreter when it detects what it believes to be a variable that is unknown. This can occur is a variable is used before it has been assigned a value or if the variable name is spelled differently than the point at which it was defined. The Python interpreter will display the line of code where the NameError was detected and indicate which name it found which was not defined.

Floating Point Numbers

Python variables can be assigned different types of data. One supported data type is the floating point number. A floating point number is a value which contains a decimal portion. Floating point numbers are used to represent numbers which have fractional quantities. For example, a = 3/5 can not be represented as an integer so the variable a is assigned the floating point value 0.6.

Python print() Function

The print() function is used to output text, numbers, or other printable information to the console. The print() function takes one or more arguments and will output those to the console. The arguments may be string or numeric literal values, variables, or the results returned from functions. If no arguments are provided, the print() function will output a blank line.

Learn Python: Syntax
Lesson 1 of 1
  1. 1
    Python is a programming language. Like other languages, it gives us a way to communicate ideas. In the case of a programming language, these ideas are “commands” that people use to communicate with…
  2. 2
    Ironically, the first thing we’re going to do is show how to tell a computer to ignore a part of a program. Text written in a program but not run by the computer is called a comment. Python inter…
  3. 3
    Now what we’re going to do is teach our computer to communicate. The gift of speech is valuable: a computer can answer many questions we have about “how” or “why” or “what” it is doing. In Python, …
  4. 4
    Computer programmers refer to blocks of text as strings. In our last exercise, we created the string “Hello world!”. In Python a string is either surrounded by double quotes (“Hello world”) or si…
  5. 5
    Programming languages offer a method of storing data for reuse. If there is a greeting we want to present, a date we need to reuse, or a user ID we need to remember we can create a variable whic…
  6. 6
    Humans are prone to making mistakes. Humans are also typically in charge of creating computer programs. To compensate, programming languages attempt to understand and explain mistakes made in their…
  7. 7
    Computers can understand much more than just strings of text. Python has a few numeric data types. It has multiple ways of storing numbers. Which one you use depends on your intended purpose for …
  8. 8
    Computers absolutely excel at performing calculations. The “compute” in their name comes from their historical association with providing answers to mathematical questions. Python performs addition…
  9. 9
    Variables that are assigned numeric values can be treated the same as the numbers themselves. Two variables can be added together, divided by 2, and multiplied by a third variable without Python di…
  10. 10
    Python can also perform exponentiation. In written math, you might see an exponent as a superscript number, but typing superscript numbers isn’t always easy on modern keyboards. Since this operatio…
  11. 11
    Python offers a companion to the division operator called the modulo operator. The modulo operator is indicated by % and gives the remainder of a division calculation. If the number is divisible, t…
  12. 12
    The + operator doesn’t just add two numbers, it can also “add” two strings! The process of combining two strings is called string concatenation. Performing string concatenation creates a brand ne…
  13. 13
    Python offers a shorthand for updating variables. When you have a number saved in a variable and want to add to the current value of the variable, you can use the += (plus-equals) operator. # Firs…
  14. 14
    Python strings are very flexible, but if we try to create a string that occupies multiple lines we find ourselves face-to-face with a SyntaxError. Python offers a solution: multi-line strings. By…
  15. 15
    In this lesson, we accomplished a lot of things! We instructed our computers to print messages, we stored these messages as variables, and we learned to update those messages depending on the part …

What you'll create

Portfolio projects that showcase your new skills

Pro Logo