The CSV File Format and Two Dimensional Lists

CS 65: Introduction to Computer Science I

A useful string method: split()

split() is a string method that will break a string into parts based on some delimeter

In [1]:
marathon_time = "4:52:20"
marathon_time_split = marathon_time.split(":")
print(marathon_time_split)
['4', '52', '20']
In [2]:
rainfall_data = "0.0, 0.4, 1.3, 1.1, 2.5, 0.0, 0.6"
rainfall_list = rainfall_data.split(", ")
print(rainfall_list)
print(float(rainfall_list[4]))
['0.0', '0.4', '1.3', '1.1', '2.5', '0.0', '0.6']
2.5

CSV files - comma separated values

CSV files is a common, simple file format for data stored in tables. They can often be opened with either spreadsheet software or text editors.

National parks data is from https://en.wikipedia.org/wiki/List_of_areas_in_the_United_States_National_Park_System

Option 1: Read it into a list and then process each row as needed

In [9]:
with open("nationalparks.csv") as parkfile:
    parks = parkfile.readlines()

#print(parks)

parks[3]
Out[9]:
'Arches National Park,Utah,1971,76678.98\n'
In [10]:
arches = parks[3]
print(arches)
arches = arches.rstrip()
arches_list = arches.split(',')
print(arches_list)
arches_acres = float(arches_list[3])
print(arches_list[0],"is",(arches_acres/640),"square miles")
Arches National Park,Utah,1971,76678.98

['Arches National Park', 'Utah', '1971', '76678.98']
Arches National Park is 119.81090624999999 square miles

Option 2: use the csv module

In [11]:
import csv

with open("nationalparks.csv") as npfile:
    parks = csv.reader(npfile)
    parks = list(parks)
    
parks
Out[11]:
[['Name', 'Location', 'Year established', 'Area in acres'],
 ['Acadia National Park', 'Maine', '1919', '49076.63'],
 ['National Park of American Samoa', 'American Samoa', '1988', '8256.67'],
 ['Arches National Park', 'Utah', '1971', '76678.98'],
 ['Badlands National Park', 'South Dakota', '1978', '242755.94'],
 ['Big Bend National Park', 'Texas', '1944', '801163.21'],
 ['Biscayne National Park', 'Florida', '1980', '172971.11'],
 ['Black Canyon of the Gunnison National Park',
  'Colorado',
  '1999',
  '30779.83'],
 ['Bryce Canyon National Park', 'Utah', '1928', '35835.08'],
 ['Canyonlands National Park', 'Utah', '1964', '337597.83'],
 ['Capitol Reef National Park', 'Utah', '1971', '241904.50'],
 ['Carlsbad Caverns National Park', 'New Mexico', '1930', '46766.45'],
 ['Channel Islands National Park', 'California', '1980', '249561.00'],
 ['Congaree National Park', 'South Carolina', '2003', '26476.47'],
 ['Crater Lake National Park', 'Oregon', '1902', '183224.05'],
 ['Cuyahoga Valley National Park', 'Ohio', '2000', '32571.88'],
 ['Death Valley National Park', 'California, Nevada', '1994', '3408395.63'],
 ['Denali National Park', 'Alaska', '1917', '4740911.16'],
 ['Dry Tortugas National Park', 'Florida', '1992', '64701.22'],
 ['Everglades National Park', 'Florida', '1947', '1508938.57'],
 ['Gates of the Arctic National Park', 'Alaska', '1980', '7523897.45'],
 ['Gateway Arch National Park', 'Missouri', '2018', '192.83'],
 ['Glacier National Park (part of Waterton-Glacier International Peace Park)',
  'Montana',
  '1910',
  '1013126.39'],
 ['Glacier Bay National Park', 'Alaska', '1980', '3223383.43'],
 ['Grand Canyon National Park', 'Arizona', '1919', '1201647.03'],
 ['Grand Teton National Park', 'Wyoming', '1929', '310044.36'],
 ['Great Basin National Park', 'Nevada', '1986', '77180.00'],
 ['Great Sand Dunes National Park', 'Colorado', '2004', '107341.87'],
 ['Great Smoky Mountains National Park',
  'North Carolina, Tennessee',
  '1934',
  '522426.88'],
 ['Guadalupe Mountains National Park', 'Texas', '1966', '86367.10'],
 ['Haleakala National Park', 'Hawaii', '1916', '33264.62'],
 ['Hawaii Volcanoes National Park', 'Hawaii', '1916', '325605.28'],
 ['Hot Springs National Park', 'Arkansas', '1921', '5554.15'],
 ['Indiana Dunes National Park', 'Indiana', '2019', '15349.08'],
 ['Isle Royale National Park', 'Michigan', '1940', '571790.30'],
 ['Joshua Tree National Park', 'California', '1994', '795155.85'],
 ['Katmai National Park', 'Alaska', '1980', '3674529.33'],
 ['Kenai Fjords National Park', 'Alaska', '1980', '669650.05'],
 ['Kings Canyon National Park', 'California', '1940', '461901.20'],
 ['Kobuk Valley National Park', 'Alaska', '1980', '1750716.16'],
 ['Lake Clark National Park', 'Alaska', '1980', '2619816.49'],
 ['Lassen Volcanic National Park', 'California', '1916', '106589.02'],
 ['Mammoth Cave National Park', 'Kentucky', '1941', '54011.91'],
 ['Mesa Verde National Park', 'Colorado', '1906', '52485.17'],
 ['Mount Rainier National Park', 'Washington', '1899', '236381.64'],
 ['New River Gorge National Park and Preserve',
  'West Virginia',
  '2020',
  '72185.76'],
 ['North Cascades National Park', 'Washington', '1968', '504780.94'],
 ['Olympic National Park', 'Washington', '1938', '922649.41'],
 ['Petrified Forest National Park', 'Arizona', '1962', '221390.21'],
 ['Pinnacles National Park', 'California', '2013', '26685.73'],
 ['Redwood National and State Parks', 'California', '1968', '138999.37'],
 ['Rocky Mountain National Park', 'Colorado', '1915', '265807.25'],
 ['Saguaro National Park', 'Arizona', '1994', '92867.42'],
 ['Sequoia National Park', 'California', '1890', '404062.63'],
 ['Shenandoah National Park', 'Virginia', '1935', '199223.77'],
 ['Theodore Roosevelt National Park', 'North Dakota', '1978', '70446.89'],
 ['Virgin Islands National Park', 'U.S. Virgin Islands', '1956', '15052.33'],
 ['Voyageurs National Park', 'Minnesota', '1975', '218222.35'],
 ['White Sands National Park', 'New Mexico', '2019', '146344.31'],
 ['Wind Cave National Park', 'South Dakota', '1903', '33970.84'],
 ['Wrangell–St. Elias National Park', 'Alaska', '1980', '8323146.48'],
 ['Yellowstone National Park',
  'Idaho, Montana, Wyoming',
  '1872',
  '2219790.71'],
 ['Yosemite National Park', 'California', '1890', '761747.50'],
 ['Zion National Park', 'Utah', '1919', '147242.66']]

Tabular data stored in a list-of-lists like this is sometimes called a two-dimensional list.

2D list data can be accessed with two indices, the first one for the row, and the second one for the column.

In [12]:
print(parks[3][0]) #row 3, column 0
print(parks[3][3]) #row 3, column 3

print(parks[3][0],"is",(float(parks[3][3])/640),"square miles")
Arches National Park
76678.98
Arches National Park is 119.81090624999999 square miles

Processing all rows

If we want to do something with every row or every column (or both), we could iterate through it with a loop.

In [13]:
import csv

with open("nationalparks.csv") as npfile:
    parks = csv.reader(npfile)
    parks = list(parks)

    
    park_counter = 1
    
    while park_counter < len(parks):
        print(parks[park_counter][0],"is",(float(parks[park_counter][3])/640),"square miles")
        park_counter += 1
    
Acadia National Park is 76.682234375 square miles
National Park of American Samoa is 12.901046875 square miles
Arches National Park is 119.81090624999999 square miles
Badlands National Park is 379.30615625 square miles
Big Bend National Park is 1251.817515625 square miles
Biscayne National Park is 270.26735937499996 square miles
Black Canyon of the Gunnison National Park is 48.093484375 square miles
Bryce Canyon National Park is 55.992312500000004 square miles
Canyonlands National Park is 527.496609375 square miles
Capitol Reef National Park is 377.97578125 square miles
Carlsbad Caverns National Park is 73.07257812499999 square miles
Channel Islands National Park is 389.9390625 square miles
Congaree National Park is 41.369484375 square miles
Crater Lake National Park is 286.287578125 square miles
Cuyahoga Valley National Park is 50.8935625 square miles
Death Valley National Park is 5325.6181718749995 square miles
Denali National Park is 7407.673687500001 square miles
Dry Tortugas National Park is 101.09565625 square miles
Everglades National Park is 2357.7165156250003 square miles
Gates of the Arctic National Park is 11756.089765625 square miles
Gateway Arch National Park is 0.301296875 square miles
Glacier National Park (part of Waterton-Glacier International Peace Park) is 1583.009984375 square miles
Glacier Bay National Park is 5036.536609375 square miles
Grand Canyon National Park is 1877.5734843750001 square miles
Grand Teton National Park is 484.44431249999997 square miles
Great Basin National Park is 120.59375 square miles
Great Sand Dunes National Park is 167.721671875 square miles
Great Smoky Mountains National Park is 816.292 square miles
Guadalupe Mountains National Park is 134.94859375000001 square miles
Haleakala National Park is 51.97596875000001 square miles
Hawaii Volcanoes National Park is 508.75825000000003 square miles
Hot Springs National Park is 8.678359375 square miles
Indiana Dunes National Park is 23.9829375 square miles
Isle Royale National Park is 893.4223437500001 square miles
Joshua Tree National Park is 1242.4310156249999 square miles
Katmai National Park is 5741.452078125 square miles
Kenai Fjords National Park is 1046.328203125 square miles
Kings Canyon National Park is 721.720625 square miles
Kobuk Valley National Park is 2735.4939999999997 square miles
Lake Clark National Park is 4093.4632656250005 square miles
Lassen Volcanic National Park is 166.54534375 square miles
Mammoth Cave National Park is 84.39360937500001 square miles
Mesa Verde National Park is 82.008078125 square miles
Mount Rainier National Park is 369.3463125 square miles
New River Gorge National Park and Preserve is 112.79024999999999 square miles
North Cascades National Park is 788.72021875 square miles
Olympic National Park is 1441.639703125 square miles
Petrified Forest National Park is 345.922203125 square miles
Pinnacles National Park is 41.696453125 square miles
Redwood National and State Parks is 217.186515625 square miles
Rocky Mountain National Park is 415.323828125 square miles
Saguaro National Park is 145.10534375 square miles
Sequoia National Park is 631.347859375 square miles
Shenandoah National Park is 311.287140625 square miles
Theodore Roosevelt National Park is 110.073265625 square miles
Virgin Islands National Park is 23.519265625 square miles
Voyageurs National Park is 340.972421875 square miles
White Sands National Park is 228.662984375 square miles
Wind Cave National Park is 53.0794375 square miles
Wrangell–St. Elias National Park is 13004.916375 square miles
Yellowstone National Park is 3468.422984375 square miles
Yosemite National Park is 1190.23046875 square miles
Zion National Park is 230.06665625 square miles

Example: How many national parks are in a given state?

In [ ]:
import csv

with open("nationalparks.csv") as npfile:
    parks = csv.reader(npfile)
    parks = list(parks)
    
state = input("Enter a state: ")

#let's code this up in class
In [14]:
import csv

with open("nationalparks.csv") as npfile:
    parks = csv.reader(npfile)
    parks = list(parks)
    
state = input("Enter a state: ")

park_counter = 1
parks_in_state = 0

while park_counter < len(parks):
    
    if parks[park_counter][1] == state:
        parks_in_state += 1
    
    
    park_counter += 1
    
print("There are",parks_in_state,"national parks in",state)
Enter a state: Utah
There are 5 national parks in Utah