Basic usage of the Pandas library to download a dataset, explore its contents, clean up missing or invalid data, filter the data according to different criteria, and plot visualizations of the data.
Press Spacebar
to go to the next slide (or ?
to see all navigation shortcuts)
is a widely used programming language with many useful libraries
an interactive notebook style of using a programming language (aka the "Kernel")
Notebook is separated into cells, which can be
To select a cell: click on it with the mouse
To run the selected cell, click the Run
button, or press Ctrl+Enter
, or click "Cell -> Run Cells" on the menubar
# This is a code cell: press Ctrl+Enter to execute the code in it
#
print("Hello World!")
Hello World!
This is a markdown cell, which can contain
Two modes of interacting with the active/selected cell
Enter
with a cell selectedEscape
Lots of keyboard shortcuts available. Press Escape
to enter command mode, then the H
key to see a list.
Some commonly used shortcuts:
A
: insert a cell above the current cellB
: insert a cell below the current cellM
: convert the current cell to a markdown cellY
: convert the current cell to a code cellShift+Enter
: run the current cell and advance to the next cellKernel -> Restart
(or command mode shortcut: 0 0
)Kernel -> Restart and Clear Output
Kernel -> Restart and Run All
# any lines starting with "#" are comments that Python ignores
#
# assign the number 12 to the variable "a":
#
a = 12
# any variable or object can be printed
print(a)
12
# display the type of an object
type(a)
int
# variables can be re-assigned, including to different types
a = "Hello!"
print(a)
Hello!
type(a)
str
# a list is an ordered container of objects (the objects don't have to have the same type)
# create one by listing items inside square brackets, separated by commas:
my_list = [1, 3, 88, -13, "hello"]
print(my_list)
[1, 3, 88, -13, 'hello']
type(my_list)
list
len(my_list)
5
# can reference an item in the list by it's index: 0 is the first item
print(my_list[0])
1
# can also use negative indices: -1 is the last item, -2 the second-to-last, etc
print(my_list[-1])
hello
# can use slicing to get a subset of the list: here elements with index 1 up to (but not including) index 3:
print(my_list[1:3])
[3, 88]
# can omit starting element of slice: defaults to first element
print(my_list[:2])
[1, 3]
# can omit end element of slice: defaults to number of elements
print(my_list[3:])
[-13, 'hello']
# can omit start and end elements of slice: get all elements
print(my_list[:])
[1, 3, 88, -13, 'hello']
# can add two lists together: this concatenates them into a single long list
print(my_list + [5, 6, 7])
[1, 3, 88, -13, 'hello', 5, 6, 7]
# can iterate over the items in a list
for item in my_list:
print(item)
1 3 88 -13 hello
# a dictionary is an unordered set of key-value pairs
# create one by listing key:value pairs inside curly brackets, separated by commas
my_dict = {"name": "Bob", "age": 6}
print(my_dict)
{'name': 'Bob', 'age': 6}
type(my_dict)
dict
len(my_dict)
2
# can look up a value using its key
print(my_dict["name"])
Bob
# can add a key-value pair to the dictionary by assinging a value to a key
my_dict["sizes"] = [1, 2, 3]
print(my_dict)
{'name': 'Bob', 'age': 6, 'sizes': [1, 2, 3]}
# adding an existing key overwrites the old value with the new one
my_dict["sizes"] = [5, 10, 24]
print(my_dict)
{'name': 'Bob', 'age': 6, 'sizes': [5, 10, 24]}
# can iterate over dictionary items using dict.items()
for key, value in my_dict.items():
print(key, value)
name Bob age 6 sizes [5, 10, 24]
# functions are defined using the def keyword
def my_function():
print("hi")
my_function()
hi
# functions can take arguments
def my_function(name):
print("hi", name)
my_function("Liam")
hi Liam
# import a library, and (optionally) give it a shorter name
import numpy as np
my_list = [1, 2, 3, 4, 5]
# library functions accessed using library_name.function
# here we create a numpy array from a list
my_array = np.array(my_list)
print(my_array)
[1 2 3 4 5]
type(my_array)
numpy.ndarray
# apply the numpy `sqrt` function to every element of the array
np.sqrt(my_array)
array([1. , 1.41421356, 1.73205081, 2. , 2.23606798])
# display help about this sqrt function
?np.sqrt
np.mean(my_array)
3.0
np.std(my_array)
1.4142135623730951