Sets, Dictionaries, and JSON¶

CS 66: Introduction to Computer Science II¶

Before we get started¶

Let's discuss the ChatGPT study

  • Informed consent
  • Attitudes survey: https://forms.gle/yWbhnT6BthvHNgxa6

References for this lecture¶

Problem Solving with Algorithms and Data Structures using Python, Section 1.8: https://runestone.academy/ns/books/published//pythonds/Introduction/GettingStartedwithData.html

Also see Chapter 11 of Think Python ( https://greenteapress.com/thinkpython2/html/thinkpython2012.html )

Sets¶

A set is like a list, but the items don't have any order.

In [6]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

print(my_list)
print(my_set)
['bread', 'milk', 'rice', 'butter', 'eggs', 'apples']
{'bread', 'eggs', 'rice', 'milk', 'butter', 'apples'}

Adding items in sets vs. lists¶

Because sets have no order, there is an add method instead of append and insert.

In [7]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

my_list.append("bananas")
my_set.add("bananas")

print(my_list)
print(my_set)
['bread', 'milk', 'rice', 'butter', 'eggs', 'apples', 'bananas']
{'bread', 'bananas', 'eggs', 'rice', 'milk', 'butter', 'apples'}

Group Activity Problem 1¶

The following code has some things you can do with a list. Discuss what each of them does, and then try to see if you can do it with a set too. Write down any differences there are between how they work as a list compared with a set.

In [ ]:
my_list = ["bread","milk","rice","butter","eggs","apples"]
my_set = {"bread","milk","rice","butter","eggs","apples"}

print( len(my_list) )
print( my_list[2] )
print( "milk" in my_list)
my_list.remove("eggs")
print( my_list )
print(my_list.pop())
print( my_list )

for item in my_list:
    print(item)

Dictionaries¶

A dictionary stores a collection of key-value pairs.

Syntax works like lists, except you use the key in brackets instead of the index.

In [8]:
cs_dept_phonebook = { 
                        "Porter"  : "3041",
                        "Case"    : "4618",
                        "Manley"  : "2177",
                        "Urness"  : "2188",
                        "Rieck"   : "3795"
                    }

print( cs_dept_phonebook["Manley"] ) #look up Manley's phone number
2177

Group Activity Problem 2¶

See if you can guess/remember the syntax for adding, updating, and removing items in a dictionary. Try the following:

  1. There's a mistake in Urness' phone number. Change it to "2118".
  2. Some new professors are missing. Add "Moore" into the dictionary with phone number "5555" (this is just for an example - they actually haven't assigned her a phone number yet... I don't know what is taking so long)
  3. Rieck has retired. Remove his entry from the phone book (hint: try remove and/or pop)
In [ ]:
 

Maps vs. Records¶

Dictionaries are sometimes used as a map where all the keys are the same kind of thing and all the values are the same kind of thing.

Dictionaries are sometimes used as a record where each of the keys represent different data about the same item.

In [ ]:
cs_dept_phonebook = { 
                        "Porter"  : "3041",
                        "Case"    : "4618",
                        "Manley"  : "2177",
                        "Urness"  : "2188",
                        "Rieck"   : "3795"
                    } #a map

manley_record = {
                    "name" : "Eric Manley",
                    "email" : "eric.manley@drake.edu",
                    "building" : "Collier-Scripps Hall",
                    "room" : 327,
                    "phone" : "(515) 271-2177"
                }

and sometimes you'll even see a list or map of records

In [9]:
contact_list = [{"name" : "Eric Manley","email" : "eric.manley@drake.edu","building" : "Collier-Scripps Hall","room" : 327,"phone" : "(515) 271-2177"},
                {"name" : "Timothy Urness","email" : "timothy.urness@drake.edu","building" : "Collier-Scripps Hall","room" : 307,"phone" : " (515) 271-2118 "},
                {"name" : "Meredith Moore","email" : "meredith.moore@drake.edu ","building" : "Collier-Scripps Hall","room" : 325}
               ]

print( contact_list[2]["name"] )
Meredith Moore

JSON¶

JSON stands for JavaScript Object Notation

JavaScript is a different programming language, and JSON came from it

Very popular format used to store and transmit data

  • human-readable
  • uses same notation as Python for lists and dictionaries 😀
    • no tuples 🙁
    • only recognizes double-quote strings "like this", 'not like this'
  • databases
  • communication with a server

Group Activity Problem 3¶

Download the file HighestGrossingMovies.json and put it in the folder where you normally keep your .py files. The data in this file is originally from here (though I restructured it for use in this course): https://www.kaggle.com/sanjeetsinghnaik/top-1000-highest-grossing-movies

Open the file in a plain text editor (e.g., notepad, textedit, or even VS Code) and look at what the contents of the file look like. In your group, discuss what kind of data you see in this file.

Then create a .py on your computer with the following code, which is how you can load JSON files in Python.

In [ ]:
import json

with open("HighestGrossingMovies.json") as moviefile:
    movies = json.load(moviefile)
    
print(movies)

If you run this in VS Code, it will likely not be able to find your file - the terminal runs from your home directory by default. So, to change the terminal to run from the location where your file is, type the cd command (which stands for change directory) and then drag the folder where you put the .json file onto the terminal.

cd.png

This should fill in the path to that directory, and then when you hit enter, you should see that the terminal is set to that directory's name.

aftercd.png

Now you should be able to run the program as normal and see the output.

Group Activity Problem 4¶

For the program above, answer the following questions

  1. What is the type of the movies variable?
  2. How could you get it to print just the first record? Change the code and run it.
  3. How could you get it to print the first 5 records? Change the code and run it.
  4. Describe the structure of the data in movies - i.e., is it a list of lists? a list of dictionaries? Do the records contain other dictionaries in them? Do records contain lists? What are the types of the keys and values within each record?

Group Activity Problem 5¶

Add this code to your file, and run it. Describe what it does.

In [ ]:
for m in movies:
    print(m["Title"])

Group Activity Problem 6¶

Write the code that will print out the title of all movies that contain "Star Wars" in their name.

Hint: You can check if one string is a substring of another with code like

In [ ]:
"Star Wars" in "Star Wars: Episode VII - The Force Awakens (2015)"

Group Activity Problem 7¶

Write the code that will print the names of all comedies (i.e., movies which contain "Comedy" as one of their items in the Genre list).

Group Activity Problem 8¶

Write the code that will determine which comedy had the highest world sales (note that the file seems to be sorted by domestic sales, not world sales, so you will have to loop through all the records).

Group Activity Problem 9¶

Write a function called most_popular_in_genre which takes in a list of movie records (in the same format as the movies variable above) and the name of a movie genre and returns the record of the movie from that genre with the highest world sales.

For example, if I call the function like this:

In [ ]:
print( most_popular_in_genre(movies,"Comedy") )

it should result in

{'Title': 'Frozen II (2019)', 'Summary': "Anna, Elsa, Kristoff, Olaf and Sven leave Arendelle to travel to an ancient, autumn-bound forest of an enchanted land. They set out to find the origin of Elsa's powers in order to save their kingdom.", 'Distributor': 'Walt Disney Studios Motion Pictures', 'Release Date': 'November 20, 2019', 'Domestic Sales': 477373578, 'International Sales': 972653355, 'World Sales': 1450026933, 'Genre': "['Adventure', 'Animation', 'Comedy', 'Family', 'Fantasy', 'Musical']", 'Runtime': '1 hr 43 min', 'MPAA Rating': 'PG'}

make sure not to refer to the global movies variable inside your function. This function should still work if I copied it into a new .py file and loaded it with a different data set that has the same form (for example, if I used an updated top 1000 movie data set in a few years).

Make sure to save your code for this exercise - we may continue working on it and turn it in for an assignment later.