Let's discuss the ChatGPT study
Problem Solving with Algorithms and Data Structures using Python, Section 1.8: https://runestone.academy/ns/books/published//pythonds/Introduction/GettingStartedwithData.html
Also see Chapter 11 of Think Python ( https://greenteapress.com/thinkpython2/html/thinkpython2012.html )
A set is like a list, but the items don't have any order.
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}
print(my_list)
print(my_set)
['bread', 'milk', 'rice', 'butter', 'eggs', 'apples'] {'bread', 'eggs', 'rice', 'milk', 'butter', 'apples'}
Because sets have no order, there is an add method instead of append and insert.
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}
my_list.append("bananas")
my_set.add("bananas")
print(my_list)
print(my_set)
['bread', 'milk', 'rice', 'butter', 'eggs', 'apples', 'bananas'] {'bread', 'bananas', 'eggs', 'rice', 'milk', 'butter', 'apples'}
The following code has some things you can do with a list. Discuss what each of them does, and then try to see if you can do it with a set too. Write down any differences there are between how they work as a list compared with a set.
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}
print( len(my_list) )
print( my_list[2] )
print( "milk" in my_list)
my_list.remove("eggs")
print( my_list )
print(my_list.pop())
print( my_list )
for item in my_list:
print(item)
A dictionary stores a collection of key-value pairs.
Syntax works like lists, except you use the key in brackets instead of the index.
cs_dept_phonebook = {
"Porter" : "3041",
"Case" : "4618",
"Manley" : "2177",
"Urness" : "2188",
"Rieck" : "3795"
}
print( cs_dept_phonebook["Manley"] ) #look up Manley's phone number
2177
See if you can guess/remember the syntax for adding, updating, and removing items in a dictionary. Try the following:
Dictionaries are sometimes used as a map where all the keys are the same kind of thing and all the values are the same kind of thing.
Dictionaries are sometimes used as a record where each of the keys represent different data about the same item.
cs_dept_phonebook = {
"Porter" : "3041",
"Case" : "4618",
"Manley" : "2177",
"Urness" : "2188",
"Rieck" : "3795"
} #a map
manley_record = {
"name" : "Eric Manley",
"email" : "eric.manley@drake.edu",
"building" : "Collier-Scripps Hall",
"room" : 327,
"phone" : "(515) 271-2177"
}
and sometimes you'll even see a list or map of records
contact_list = [{"name" : "Eric Manley","email" : "eric.manley@drake.edu","building" : "Collier-Scripps Hall","room" : 327,"phone" : "(515) 271-2177"},
{"name" : "Timothy Urness","email" : "timothy.urness@drake.edu","building" : "Collier-Scripps Hall","room" : 307,"phone" : " (515) 271-2118 "},
{"name" : "Meredith Moore","email" : "meredith.moore@drake.edu ","building" : "Collier-Scripps Hall","room" : 325}
]
print( contact_list[2]["name"] )
Meredith Moore
JSON stands for JavaScript Object Notation
JavaScript is a different programming language, and JSON came from it
Very popular format used to store and transmit data
"like this"
, 'not like this'
Download the file HighestGrossingMovies.json
and put it in the folder where you normally keep your .py
files. The data in this file is originally from here (though I restructured it for use in this course): https://www.kaggle.com/sanjeetsinghnaik/top-1000-highest-grossing-movies
Open the file in a plain text editor (e.g., notepad, textedit, or even VS Code) and look at what the contents of the file look like. In your group, discuss what kind of data you see in this file.
Then create a .py
on your computer with the following code, which is how you can load JSON files in Python.
import json
with open("HighestGrossingMovies.json") as moviefile:
movies = json.load(moviefile)
print(movies)
If you run this in VS Code, it will likely not be able to find your file - the terminal runs from your home directory by default. So, to change the terminal to run from the location where your file is, type the cd
command (which stands for change directory) and then drag the folder where you put the .json
file onto the terminal.
This should fill in the path to that directory, and then when you hit enter, you should see that the terminal is set to that directory's name.
Now you should be able to run the program as normal and see the output.
For the program above, answer the following questions
movies
variable?movies
- i.e., is it a list of lists? a list of dictionaries? Do the records contain other dictionaries in them? Do records contain lists? What are the types of the keys and values within each record?Add this code to your file, and run it. Describe what it does.
for m in movies:
print(m["Title"])
Write the code that will print out the title of all movies that contain "Star Wars"
in their name.
Hint: You can check if one string is a substring of another with code like
"Star Wars" in "Star Wars: Episode VII - The Force Awakens (2015)"
Write the code that will print the names of all comedies (i.e., movies which contain "Comedy" as one of their items in the Genre list).
Write the code that will determine which comedy had the highest world sales (note that the file seems to be sorted by domestic sales, not world sales, so you will have to loop through all the records).
Write a function called most_popular_in_genre
which takes in a list of movie records (in the same format as the movies
variable above) and the name of a movie genre and returns the record of the movie from that genre with the highest world sales.
For example, if I call the function like this:
print( most_popular_in_genre(movies,"Comedy") )
it should result in
{'Title': 'Frozen II (2019)', 'Summary': "Anna, Elsa, Kristoff, Olaf and Sven leave Arendelle to travel to an ancient, autumn-bound forest of an enchanted land. They set out to find the origin of Elsa's powers in order to save their kingdom.", 'Distributor': 'Walt Disney Studios Motion Pictures', 'Release Date': 'November 20, 2019', 'Domestic Sales': 477373578, 'International Sales': 972653355, 'World Sales': 1450026933, 'Genre': "['Adventure', 'Animation', 'Comedy', 'Family', 'Fantasy', 'Musical']", 'Runtime': '1 hr 43 min', 'MPAA Rating': 'PG'}
make sure not to refer to the global movies
variable inside your function. This function should still work if I copied it into a new .py
file and loaded it with a different data set that has the same form (for example, if I used an updated top 1000 movie data set in a few years).
Make sure to save your code for this exercise - we may continue working on it and turn it in for an assignment later.