This assignment appears in the Nifty Assignment track at the 2023 CCSC Central Plains conference.
Students use data retrieved from a Web API and use it to display an interactive data visualization. This example uses an easy-to-use COVID data source, though it could be adapted to many other free APIs. The assignment has been broken up into three parts, and an instructor may choose to use one, two, or all three parts.
requests
package to retrieve live data from a Web API, returned as nested structures of lists and dictionariesOverall, there are several assignment benefits of utilizing these tools:
This assignment is targeted at students near the end of CS1 or beginning of CS2. It is especially relevant for students interested in data science. Students should have experience with basic control structures (if statements and loops) and know how to access individual items in lists and dictionaries. Part 3 assumes some prior knowledge of the Dash framework for building web applications in Python. As this is not a common technology used in introductory programming courses, I am also including a short walk-through of Dash basics that can be done as part of this assignment or as an in-class lab.
While the assignment utilizes several technologies which are new to most students (Web APIs, visualization libraries, web frameworks), the assignment is designed to walk through the use of these for beginners. The programming challenge comes from processing of the data, which is structured as nested lists and dictionaries.
If the instructor wants to emphasize the use of Python libraries, they should add exercises where students have to search the documentation to figure out how to make various changes (e.g., different endpoints from the API, different kinds of visualizations, adjusting parameters for library functions, etc.). Data from other Web APIs (I recommend NASA and Yelp) can also be easily substituted, though they will require the additional steps of requesting and using API keys.
We will use the requests
, plotly
, and dash
packages in this assignment. The pandas
module will also need to be installed, though we will not use it directly.
One way to install these is to use the pip
package installer. If you've never used pip
before, you may need to install it with a command like this (see https://docs.python.org/3/library/ensurepip.html ).
python3 -m ensurepip --upgrade
Then, to install these packages, run each of the following commands.
python3 -m pip install requests
python3 -m pip install pandas
python3 -m pip install plotly
python3 -m pip install dash
You may need to replace python3
command in the examples above with the command you use for executing Python 3 (e.g., python
or python3.8
).
This document will also use the pprint
module to display data nicely. For more info, see https://docs.python.org/3/library/pprint.html .
Now we're going to try requesting some JSON data directly from a server on the Internet.
One source for interesting data: Web APIs - application programming interfaces that allow your programmer to access data through http
requests.
There are many APIs you can use to build your own applications: NASA, Yelp!, Wikipedia, Associated Press, Weather, IMDB, etc.
We'll try out an easy-to-use API of COVID data: https://covid19api.com/
Documentation here: https://documenter.getpostman.com/view/10808728/SzS8rjbc
Try this to request data about COVID in the US:
import requests
from pprint import pprint
response = requests.get("https://api.covid19api.com/live/country/united-states")
print(response)
<Response [200]>
If the response that you print is
<Response [200]>
Then it means that it worked correctly since 200 is the http
code for "it worked". If you see any other code, it means something went wrong.
Next, we'll get the data from the response using the .json()
method. This particular request returns a list of dictionaries with information about COVID numbers in different US geographic areas. We'll print the first 5 to get an idea of what it looks like.
import requests
from pprint import pprint
response = requests.get("https://api.covid19api.com/live/country/united-states")
data = response.json() #returns a list of dictionaries
pprint(data[0:5]) #let's look at the first 5 entries in the list
[{'Active': 0, 'City': '', 'CityCode': '', 'Confirmed': 0, 'Country': 'United States of America', 'CountryCode': 'US', 'Date': '1970-01-01T00:00:00Z', 'Deaths': 0, 'ID': '9cb646d4-fb86-4b74-8c16-df1f8d0bedaf', 'Lat': '18.35', 'Lon': '-64.93', 'Province': 'United States Virgin Islands', 'Recovered': 0}, {'Active': 0, 'City': '', 'CityCode': '', 'Confirmed': 0, 'Country': 'United States of America', 'CountryCode': 'US', 'Date': '1970-01-01T00:00:00Z', 'Deaths': 0, 'ID': 'f27bd486-5cf5-4424-ae99-45b123578efd', 'Lat': '-14.27', 'Lon': '-170.13', 'Province': 'American Samoa', 'Recovered': 0}, {'Active': 49, 'City': '', 'CityCode': '', 'Confirmed': 49, 'Country': 'United States of America', 'CountryCode': 'US', 'Date': '2020-08-04T00:00:00Z', 'Deaths': 0, 'ID': '1b5a049b-59b6-474a-a4b3-290ff563da2b', 'Lat': '35.44', 'Lon': '139.64', 'Province': 'Diamond Princess', 'Recovered': 0}, {'Active': 100, 'City': '', 'CityCode': '', 'Confirmed': 103, 'Country': 'United States of America', 'CountryCode': 'US', 'Date': '2020-08-04T00:00:00Z', 'Deaths': 3, 'ID': 'd3458b9a-8446-4760-89ab-7367195b6c8e', 'Lat': '37.65', 'Lon': '-122.67', 'Province': 'Grand Princess', 'Recovered': 0}, {'Active': 691638, 'City': '', 'CityCode': '', 'Confirmed': 709622, 'Country': 'United States of America', 'CountryCode': 'US', 'Date': '2021-06-25T00:00:00Z', 'Deaths': 17984, 'ID': '03329e68-7b08-413e-8a83-d7a5baa6537d', 'Lat': '42.23', 'Lon': '-71.53', 'Province': 'Massachusetts', 'Recovered': 0}]
Explore the data that you received from the API request. Answer the following questions about it:
"Province"
field of each dictionary.Exra Programming Challenge: Write the code that will print the most recent entry from the state of Iowa.
Now we're going to try accessing some different data from the same Web API service. Notice that the code below is the same, but it uses a different web address - these different web addresses are called endpoints of the API.
import requests
response = requests.get("https://api.covid19api.com/summary")
data = response.json()
Answer the following questions about the data that you received in this request.
data["Countries"]
?Now we're going to use the data we received from the API to display some interactive visualizations.
We will start with a bar chart made with the Plotly Express bar
function (reference: https://plotly.com/python/bar-charts/ ). This function takes three required arguments:
data["Countries"]
from the API request above!If everything works right, the following code should open a web browser and display a visualization like the one shown below.
import plotly.express as px
import requests
response = requests.get("https://api.covid19api.com/summary")
data = response.json()
country_data = data["Countries"]
fig = px.bar(country_data,x="Country",y="NewConfirmed",title="New Confirmed Cases by Country")
fig.show()
Note that the above bar chart has too many countries to see any of the data well. Before making the plot, look through the list of countries and remove any that have less than 5,000 cases (it may be easier to make a new list and append only the countries with at least 5,000 cases).
import plotly.express as px
import requests
response = requests.get("https://api.covid19api.com/summary")
data = response.json()
country_data = data["Countries"]
filtered_data = []
for entry in country_data:
if entry["NewConfirmed"] > 5000:
filtered_data.append(entry)
fig = px.bar(filtered_data,x="Country",y="NewConfirmed",title="New Confirmed Cases by Country")
fig.show()
In this part of the assignment, you are going to build a web dashboard that allows the user to adjust the visualization using user interface components like dropdown menus.
If you have not used Dash before, first complete this associated Dash Lab.
One of the really cool things about Dash is that it is designed to work well with Plotly.
There is a Dash component called Graph
which expects a parameter called figure
- you can pass that any kind of Plotly figure. Try this and confirm that you can see the figure loaded into your Dash application.
from dash import Dash, html, dcc
from dash.dependencies import Input, Output
import requests
import plotly.express as px
#read the country-level data from the API
response = requests.get("https://api.covid19api.com/summary")
DATA = response.json()
fig = px.bar(DATA["Countries"],x="Country",y="NewConfirmed",title="New Confirmed Cases by Country")
app = Dash(__name__)
app.layout = html.Div(children = [
dcc.Markdown(
id = "title",
children = "## COVID Dashboard"
),
dcc.Graph(
id = "country_bar_graph",
figure = fig
)
])
if __name__ == '__main__':
app.run_server(debug=True)
Update the Dash app so that the user can select which countries they want to see on the bar graph. It should look something like this:
To de-emphasize the challenge in developing the application and instead emphasize lists and dictionaries, the following starter code can be provided. The places marked with # TODO
show where code needs to be added.
from dash import Dash, html, dcc
from dash.dependencies import Input, Output
import requests
import plotly.express as px
#read the country-level data from the API
response = requests.get("https://api.covid19api.com/summary")
DATA = response.json()
#create the initial figure of all the countries
#- this will quickly get replaced
fig = px.bar(DATA["Countries"],x="Country",y="NewConfirmed",title="New Confirmed Cases by Country")
country_names_list = ["United States of America","Canada","Mexico"]
# TODO: change this so that country_names_list contains strings for each
# country that appears in one of the dictionaries in DATA["Countries"]
app = Dash(__name__)
app.layout = html.Div(children = [
dcc.Markdown(
id = "title",
children = "## COVID Dashboard"
),
dcc.Dropdown(
id = "country_select_dropdown",
options = country_names_list,
value = ["United States of America","Canada","Mexico"],
multi = True #allows us to select multiple values
),
dcc.Graph(
id = "country_bar_graph",
figure = fig
)
])
@app.callback(
Output("country_bar_graph","figure"),
Input("country_select_dropdown","value"),
)
def update_country_graph(country_names):
#country_names is a list with the countries
#selected from the dropdown by the user
records_to_display = DATA["Countries"]
# TODO: Change records_to_display so that it contains only the countries listed in
# the country_names parameter
fig = px.bar(records_to_display,x="Country",y="NewConfirmed",title="New Confirmed Cases by Country")
return fig
if __name__ == '__main__':
app.run_server(debug=True)
This should set the stage for many potential creative projects. Here are some examples showing how this data can be used with different kinds of Plotly visualizations.
See more map examples at https://plotly.com/python/maps/
import plotly.express as px
import requests
response = requests.get("https://api.covid19api.com/summary")
data = response.json()
country_data = data["Countries"]
fig = px.choropleth(country_data,locations="Country",locationmode="country names",color="NewConfirmed",range_color=(0,10000),title="New Confirmed Cases by Country")
fig.show()
There are many kinds of charts you can use. See the gallery here: https://plotly.com/python/basic-charts/
The below example shows a line graph of the state-level COVID data.
import plotly.express as px
import requests
from dateutil.parser import parse
response = requests.get("https://api.covid19api.com/live/country/united-states")
data = response.json()
#there are some erroneous dates
#this removes those entries
filtered_data = []
for entry in data:
if parse(entry["Date"]) > parse("2021-01-01T00:00:00Z") :
filtered_data.append(entry)
fig = px.line(filtered_data, x="Date", y="Confirmed", color="Province", title='COVID Deaths in the US')
fig.show()
There are many other APIs with interesting data that can be used for applications like this. Most require programmers to register their application and request API keys that must be sent along with the request, so you will need to look at some examples of how to use each one. I recommend the following: