CS 66: Introduction to Computer Science II
Problem Solving with Algorithms and Data Structures using Python
Sections 6.6 - 6.12: https://runestone.academy/ns/books/published/pythonds/SortSearch/toctree.html
The book covers 6 sorting algorithms: Bubble Sort, Selection Sort, Insertion Sort, Shell Sort, Merge Sort, and Quick Sort.
We did Bubble Sort last time. Now we'll do Insertion and Merge.
Static images are copied from the book for those who are not in class - though I recommend reading those sections and walking through the algorithms on paper yourself.
Idea: Think of the list as having two parts
Algorithm
current_index
in the range of 1 to $n-1$current_index
into where it goes within the sorted part of the list (which goes from index 0 to current_index-1
)def insertionSort(alist):
for index in range(1,len(alist)):
currentvalue = alist[index]
position = index
while position>0 and alist[position-1]>currentvalue:
alist[position]=alist[position-1]
position = position-1
alist[position]=currentvalue
alist = [54,26,93,17,77,31,44,55,20]
insertionSort(alist)
print(alist)
[17, 20, 26, 31, 44, 54, 55, 77, 93]
The following code
import time
import random
N = 10000
random_list = [random.randint(1,1000000) for i in range(N)]
random_list_copy1 = random_list[:]
random_list_copy2 = random_list[:]
start = time.time()
bubbleSort(random_list_copy1)
end = time.time()
print("Bubble sort:",(end-start),"seconds")
start = time.time()
insertionSort(random_list_copy2)
end = time.time()
print("Selection sort:",(end-start),"seconds")
Bubble sort: 7.338683843612671 seconds Selection sort: 3.95986008644104 seconds
Have each person in your group repeat this experiment and compare your results.
What happens on a list of size 50,000?
Idea: Split the list in half. Sort both halves (recursively!) and then merege the two sorted halves together.
Merge Step: Merging together two sorted lists take $O(n)$ time - you keep track of an index into each smaller list and one into the new list you're copying them into. At each step, you just copy the next item from one of the smaller lists.
How many times do you have to do that?
There are $\log_2(n)$ levels in the recursion since we split it in half each time (like binary search), so the entire algorithm takes $O(n \log(n))$
def mergeSort(alist):
print("Splitting ",alist)
if len(alist)>1:
mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
mergeSort(lefthalf)
mergeSort(righthalf)
i=0
j=0
k=0
while i < len(lefthalf) and j < len(righthalf):
if lefthalf[i] <= righthalf[j]:
alist[k]=lefthalf[i]
i=i+1
else:
alist[k]=righthalf[j]
j=j+1
k=k+1
while i < len(lefthalf):
alist[k]=lefthalf[i]
i=i+1
k=k+1
while j < len(righthalf):
alist[k]=righthalf[j]
j=j+1
k=k+1
print("Merging ",alist)
alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)
Splitting [54, 26, 93, 17, 77, 31, 44, 55, 20] Splitting [54, 26, 93, 17] Splitting [54, 26] Splitting [54] Merging [54] Splitting [26] Merging [26] Merging [26, 54] Splitting [93, 17] Splitting [93] Merging [93] Splitting [17] Merging [17] Merging [17, 93] Merging [17, 26, 54, 93] Splitting [77, 31, 44, 55, 20] Splitting [77, 31] Splitting [77] Merging [77] Splitting [31] Merging [31] Merging [31, 77] Splitting [44, 55, 20] Splitting [44] Merging [44] Splitting [55, 20] Splitting [55] Merging [55] Splitting [20] Merging [20] Merging [20, 55] Merging [20, 44, 55] Merging [20, 31, 44, 55, 77] Merging [17, 20, 26, 31, 44, 54, 55, 77, 93] [17, 20, 26, 31, 44, 54, 55, 77, 93]
sort
method?¶Python's sort method is a hybrid of Merge Sort and Insertion Sort called Timsort. You can read more here: https://en.wikipedia.org/wiki/Timsort
alist = [54,26,93,17,77,31,44,55,20]
alist.sort() #built-in sort uses Timsort algorithm
print(alist)
[17, 20, 26, 31, 44, 54, 55, 77, 93]
Comment out the print statements and add mergeSort
to your timing experiment. How does it compare? Then, add in the built-in Timsort - which is the best?
When sorting, we usually don't just have a big list of numbers - often we have objects that need to be sorted by a variety of criteria. Re-write the mergeSort
function so that we could sort our movie data by the World Sales field, and sort it from largest to smallest.
Here's an example of what the results should look like if you did it right (printing only the first 10 items in the sorted list). This code shows you how you can use the the built-in sort
method with data like this (you give it a function that returns the value of the thing you want to sort by), but you will actually change some of the code inside the mergeSort
function to make that one work.
import json
from pprint import pprint #for displaying the data in an easier-to-read way
def get_world_sales(x):
return x["World Sales"]
with open("HighestGrossingMovies.json") as moviefile:
movies = json.load(moviefile)
movies.sort(key=get_world_sales,reverse=True)
pprint(movies[:10])
[{'Distributor': 'Twentieth Century Fox', 'Domestic Sales': 760507625, 'Genre': "['Action', 'Adventure', 'Fantasy', 'Sci-Fi']", 'International Sales': 2086738578, 'MPAA Rating': 'PG-13', 'Release Date': 'December 16, 2009', 'Runtime': '2 hr 42 min', 'Summary': 'A paraplegic Marine dispatched to the moon Pandora on a unique ' 'mission becomes torn between following his orders and protecting ' 'the world he feels is his home.', 'Title': 'Avatar (2009)', 'World Sales': 2847246203}, {'Distributor': 'Walt Disney Studios Motion Pictures', 'Domestic Sales': 858373000, 'Genre': "['Action', 'Adventure', 'Drama', 'Sci-Fi']", 'International Sales': 1939128328, 'MPAA Rating': 'PG-13', 'Release Date': 'April 24, 2019', 'Runtime': '3 hr 1 min', 'Summary': 'After the devastating events of Avengers: Infinity War, the ' 'universe is in ruins. With the help of remaining allies, the ' "Avengers assemble once more in order to reverse Thanos' actions " 'and restore balance to the universe.', 'Title': 'Avengers: Endgame (2019)', 'World Sales': 2797501328}, {'Distributor': 'Paramount Pictures', 'Domestic Sales': 659363944, 'Genre': "['Drama', 'Romance']", 'International Sales': 1542283320, 'MPAA Rating': 'PG-13', 'Release Date': 'December 19, 1997', 'Runtime': '3 hr 14 min', 'Summary': 'A seventeen-year-old aristocrat falls in love with a kind but ' 'poor artist aboard the luxurious, ill-fated R.M.S. Titanic.', 'Title': 'Titanic (1997)', 'World Sales': 2201647264}, {'Distributor': 'Walt Disney Studios Motion Pictures', 'Domestic Sales': 936662225, 'Genre': "['Action', 'Adventure', 'Sci-Fi']", 'International Sales': 1132859475, 'MPAA Rating': 'PG-13', 'Release Date': 'December 16, 2015', 'Runtime': '2 hr 18 min', 'Summary': 'As a new threat to the galaxy rises, Rey, a desert scavenger, ' 'and Finn, an ex-stormtrooper, must join Han Solo and Chewbacca ' 'to search for the one hope of restoring peace.', 'Title': 'Star Wars: Episode VII - The Force Awakens (2015)', 'World Sales': 2069521700}, {'Distributor': 'Walt Disney Studios Motion Pictures', 'Domestic Sales': 678815482, 'Genre': "['Action', 'Adventure', 'Sci-Fi']", 'International Sales': 1369544272, 'MPAA Rating': None, 'Release Date': None, 'Runtime': '2 hr 29 min', 'Summary': 'The Avengers and their allies must be willing to sacrifice all ' 'in an attempt to defeat the powerful Thanos before his blitz of ' 'devastation and ruin puts an end to the universe.', 'Title': 'Avengers: Infinity War (2018)', 'World Sales': 2048359754}, {'Distributor': 'Universal Pictures', 'Domestic Sales': 652385625, 'Genre': "['Action', 'Adventure', 'Sci-Fi']", 'International Sales': 1018130819, 'MPAA Rating': 'PG-13', 'Release Date': 'June 10, 2015', 'Runtime': '2 hr 4 min', 'Summary': 'A new theme park, built on the original site of Jurassic Park, ' 'creates a genetically modified hybrid dinosaur, the Indominus ' 'Rex, which escapes containment and goes on a killing spree.', 'Title': 'Jurassic World (2015)', 'World Sales': 1670516444}, {'Distributor': 'Walt Disney Studios Motion Pictures', 'Domestic Sales': 543638043, 'Genre': "['Adventure', 'Animation', 'Drama', 'Family', 'Musical']", 'International Sales': 1119261396, 'MPAA Rating': 'PG', 'Release Date': 'July 11, 2019', 'Runtime': '1 hr 58 min', 'Summary': 'After the murder of his father, a young lion prince flees his ' 'kingdom only to learn the true meaning of responsibility and ' 'bravery.', 'Title': 'The Lion King (2019)', 'World Sales': 1662899439}, {'Distributor': 'Sony Pictures Entertainment (SPE)', 'Domestic Sales': 675813257, 'Genre': "['Action', 'Adventure', 'Fantasy', 'Sci-Fi']", 'International Sales': 868642706, 'MPAA Rating': None, 'Release Date': None, 'Runtime': '2 hr 28 min', 'Summary': "With Spider-Man's identity now revealed, Peter asks Doctor " 'Strange for help. When a spell goes wrong, dangerous foes from ' 'other worlds start to appear, forcing Peter to discover what it ' 'truly means to be Spider-Man.', 'Title': 'Spider-Man: No Way Home (2021)', 'World Sales': 1544455963}, {'Distributor': 'Walt Disney Studios Motion Pictures', 'Domestic Sales': 623357910, 'Genre': "['Action', 'Adventure', 'Sci-Fi']", 'International Sales': 895457605, 'MPAA Rating': 'PG-13', 'Release Date': 'April 25, 2012', 'Runtime': '2 hr 23 min', 'Summary': "Earth's mightiest heroes must come together and learn to fight " 'as a team if they are going to stop the mischievous Loki and his ' 'alien army from enslaving humanity.', 'Title': 'The Avengers (2012)', 'World Sales': 1518815515}, {'Distributor': 'Universal Pictures', 'Domestic Sales': 353007020, 'Genre': "['Action', 'Thriller']", 'International Sales': 1162334379, 'MPAA Rating': 'PG-13', 'Release Date': 'April 1, 2015', 'Runtime': '2 hr 17 min', 'Summary': 'Deckard Shaw seeks revenge against Dominic Toretto and his ' 'family for his comatose brother.', 'Title': 'Furious 7 (2015)', 'World Sales': 1515341399}]
A tree data structure is a collection of nodes that express a hierarchical relationship.
You can think of it like a linked list, but each node may be linked to more than one other node.
Node: the circles - may contain a key and other information
Edge: arrow representing relationship between two nodes
Parent/Child: the edge goes out from the parent node and into the child node. (Mammal is the parent of Carnivora, Carnivora is the child of Mammal)
Root: node with no parent (Animalia)
Leaf: nodee with no children (Human, Chimpanzee, etc.)
A Binary Search Tree is a tree that allows a search operation to be done similar to a binary search.
Like hash tables, they are commonly used to implement the set and map abstract data types.
The book's code: BST for implementing a map (https://runestone.academy/ns/books/published/pythonds/Trees/SearchTreeImplementation.html )
Remindeer: Set operations: add
, contains
/__contains__
, remove
, size
/__len__
BST Nodes have at most 2 children, called left child and right child
Nodes in left subtree have value less than this node
Nodes in right subtree have value more than this node
This is true for every node!