Reading Data from Files

CS 65: Introduction to Computer Science I

Why use files in Python?

Often, your program needs to work with data in files - you don't want to make the user input a whole bunch of values every time they use it.

Examples:

  • analyzing weather data
  • generating payroll
  • saving progress and returning to it later (like in a Word processor)

Opening files in Python

Python provides a built-in open() function which will returns a new type with a scary name like _io.TextIOWrapper.

In [1]:
#assumes the gettysburg.txt file is in
#the same directory as your .py file
gettysburg_file = open("gettysburg.txt")
type(gettysburg_file)
Out[1]:
_io.TextIOWrapper

For our purposes, just think of the variable gettysburg_file as a variable which represents a file object.

Making sure your files are in the right place

Reading data from files

Python files work with several different methods that allow you to read data.

  • read() reads the file into a big string
  • readline() reads the next line of the file into a string
  • readlines() reads the lines of the file into a list of strings
In [2]:
gettysburg_file = open("gettysburg.txt")
file_contents = gettysburg_file.read()
print(file_contents)
Fourscore and seven years ago our fathers brought forth on
this continent a new nation, conceived in liberty and
dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether 
that nation or any nation so conceived and so dedicated can 
long endure. We are met on a great battlefield of that war. 
We have come to dedicate a portion of it as a final resting 
place for those who died here that the nation might live. 
This we may, in all propriety do. But in a larger sense, we 
cannot dedicate, we cannot consecrate, we cannot hallow this 
ground. The brave men, living and dead who struggled here 
have hallowed it far above our poor power to add or detract. 
The world will little note nor long remember what we say 
here, but it can never forget what they did here.

It is rather for us the living, we here be dedicated to the 
great task remaining before us--that from these honored dead 
we take increased devotion to that cause for which they here 
gave the last full measure of devotion--that we here highly 
resolve that these dead shall not have died in vain, that 
this nation shall have a new birth of freedom, and that 
government of the people, by the people, for the people 
shall not perish from the earth.

In [3]:
gettysburg_file = open("gettysburg.txt")
firstline = gettysburg_file.readline()
secondline = gettysburg_file.readline()
print(secondline)
this continent a new nation, conceived in liberty and

In [4]:
gettysburg_file = open("gettysburg.txt")
contents_as_list = gettysburg_file.readlines()
print( contents_as_list[8] )
place for those who died here that the nation might live. 

Opening files using with statements

Opening files with a with statement does the same thing, but it does some nice things like closing the file when it is done.

In [5]:
with open("gettysburg.txt") as gettysburg_file:
    
    gettysburg_text = gettysburg_file.readlines()
    
    print(gettysburg_text)
    
print("All done with the file.")
['Fourscore and seven years ago our fathers brought forth on\n', 'this continent a new nation, conceived in liberty and\n', 'dedicated to the proposition that all men are created equal.\n', '\n', 'Now we are engaged in a great civil war, testing whether \n', 'that nation or any nation so conceived and so dedicated can \n', 'long endure. We are met on a great battlefield of that war. \n', 'We have come to dedicate a portion of it as a final resting \n', 'place for those who died here that the nation might live. \n', 'This we may, in all propriety do. But in a larger sense, we \n', 'cannot dedicate, we cannot consecrate, we cannot hallow this \n', 'ground. The brave men, living and dead who struggled here \n', 'have hallowed it far above our poor power to add or detract. \n', 'The world will little note nor long remember what we say \n', 'here, but it can never forget what they did here.\n', '\n', 'It is rather for us the living, we here be dedicated to the \n', 'great task remaining before us--that from these honored dead \n', 'we take increased devotion to that cause for which they here \n', 'gave the last full measure of devotion--that we here highly \n', 'resolve that these dead shall not have died in vain, that \n', 'this nation shall have a new birth of freedom, and that \n', 'government of the people, by the people, for the people \n', 'shall not perish from the earth.\n']
All done with the file.

Analyzing Baby Names Example

Let's say we want to analyze baby name popularity, and we have a file with data from https://www.ssa.gov/oact/babynames/decades/names2010s.html

In [7]:
with open("top_male_baby_names_2010s.txt") as male_names_file:
    male_names = male_names_file.readlines()
    
    print(male_names)
['Noah\n', 'Liam\n', 'Jacob\n', 'William\n', 'Mason\n', 'Ethan\n', 'Michael\n', 'Alexander\n', 'James\n', 'Elijah\n', 'Benjamin\n', 'Daniel\n', 'Aiden\n', 'Logan\n', 'Jayden\n', 'Matthew\n', 'Lucas\n', 'David\n', 'Jackson\n', 'Joseph\n', 'Anthony\n', 'Samuel\n', 'Joshua\n', 'Gabriel\n', 'Andrew\n', 'John\n', 'Christopher\n', 'Oliver\n', 'Dylan\n', 'Carter\n', 'Isaac\n', 'Luke\n', 'Henry\n', 'Owen\n', 'Ryan\n', 'Nathan\n', 'Wyatt\n', 'Caleb\n', 'Sebastian\n', 'Jack\n', 'Christian\n', 'Jonathan\n', 'Julian\n', 'Landon\n', 'Levi\n', 'Isaiah\n', 'Hunter\n', 'Aaron\n', 'Charles\n', 'Thomas\n', 'Eli\n', 'Jaxon\n', 'Connor\n', 'Nicholas\n', 'Jeremiah\n', 'Grayson\n', 'Cameron\n', 'Brayden\n', 'Adrian\n', 'Evan\n', 'Jordan\n', 'Josiah\n', 'Angel\n', 'Robert\n', 'Gavin\n', 'Tyler\n', 'Austin\n', 'Colton\n', 'Jose\n', 'Dominic\n', 'Brandon\n', 'Ian\n', 'Lincoln\n', 'Hudson\n', 'Kevin\n', 'Zachary\n', 'Adam\n', 'Mateo\n', 'Jason\n', 'Chase\n', 'Nolan\n', 'Ayden\n', 'Cooper\n', 'Parker\n', 'Xavier\n', 'Asher\n', 'Carson\n', 'Jace\n', 'Easton\n', 'Justin\n', 'Leo\n', 'Bentley\n', 'Jaxson\n', 'Nathaniel\n', 'Blake\n', 'Elias\n', 'Theodore\n', 'Kayden\n', 'Luis\n', 'Tristan\n', 'Ezra\n', 'Bryson\n', 'Juan\n', 'Brody\n', 'Vincent\n', 'Micah\n', 'Miles\n', 'Santiago\n', 'Cole\n', 'Ryder\n', 'Carlos\n', 'Damian\n', 'Leonardo\n', 'Roman\n', 'Max\n', 'Sawyer\n', 'Jesus\n', 'Diego\n', 'Greyson\n', 'Alex\n', 'Maxwell\n', 'Axel\n', 'Eric\n', 'Wesley\n', 'Declan\n', 'Giovanni\n', 'Ezekiel\n', 'Braxton\n', 'Ashton\n', 'Ivan\n', 'Hayden\n', 'Camden\n', 'Silas\n', 'Bryce\n', 'Weston\n', 'Harrison\n', 'Jameson\n', 'George\n', 'Antonio\n', 'Timothy\n', 'Kaiden\n', 'Jonah\n', 'Everett\n', 'Miguel\n', 'Steven\n', 'Richard\n', 'Emmett\n', 'Victor\n', 'Kaleb\n', 'Kai\n', 'Maverick\n', 'Joel\n', 'Bryan\n', 'Maddox\n', 'Kingston\n', 'Aidan\n', 'Patrick\n', 'Edward\n', 'Emmanuel\n', 'Jude\n', 'Alejandro\n', 'Preston\n', 'Luca\n', 'Bennett\n', 'Jesse\n', 'Colin\n', 'Jaden\n', 'Malachi\n', 'Kaden\n', 'Jayce\n', 'Alan\n', 'Kyle\n', 'Marcus\n', 'Brian\n', 'Ryker\n', 'Grant\n', 'Jeremy\n', 'Abel\n', 'Riley\n', 'Calvin\n', 'Brantley\n', 'Caden\n', 'Oscar\n', 'Abraham\n', 'Brady\n', 'Sean\n', 'Jake\n', 'Tucker\n', 'Nicolas\n', 'Mark\n', 'Amir\n', 'Avery\n', 'King\n', 'Gael\n', 'Kenneth\n', 'Bradley\n', 'Cayden\n', 'Xander\n', 'Graham\n', 'Rowan']

It's annoying that the newline character \n is included in all of the strings. To remove these, you could use the rstrip() string method.

In [8]:
name = "Eric\n"
name
Out[8]:
'Eric\n'
In [9]:
name.rstrip()
Out[9]:
'Eric'

looping through and removing all the newlines

In [10]:
with open("top_male_baby_names_2010s.txt") as male_names_file:
    
    male_names = male_names_file.readlines()
    
    name_counter = 0
    
    while name_counter < len(male_names):
        male_names[name_counter] = male_names[name_counter].rstrip()
        
        name_counter += 1
     
 
    print(male_names)
['Noah', 'Liam', 'Jacob', 'William', 'Mason', 'Ethan', 'Michael', 'Alexander', 'James', 'Elijah', 'Benjamin', 'Daniel', 'Aiden', 'Logan', 'Jayden', 'Matthew', 'Lucas', 'David', 'Jackson', 'Joseph', 'Anthony', 'Samuel', 'Joshua', 'Gabriel', 'Andrew', 'John', 'Christopher', 'Oliver', 'Dylan', 'Carter', 'Isaac', 'Luke', 'Henry', 'Owen', 'Ryan', 'Nathan', 'Wyatt', 'Caleb', 'Sebastian', 'Jack', 'Christian', 'Jonathan', 'Julian', 'Landon', 'Levi', 'Isaiah', 'Hunter', 'Aaron', 'Charles', 'Thomas', 'Eli', 'Jaxon', 'Connor', 'Nicholas', 'Jeremiah', 'Grayson', 'Cameron', 'Brayden', 'Adrian', 'Evan', 'Jordan', 'Josiah', 'Angel', 'Robert', 'Gavin', 'Tyler', 'Austin', 'Colton', 'Jose', 'Dominic', 'Brandon', 'Ian', 'Lincoln', 'Hudson', 'Kevin', 'Zachary', 'Adam', 'Mateo', 'Jason', 'Chase', 'Nolan', 'Ayden', 'Cooper', 'Parker', 'Xavier', 'Asher', 'Carson', 'Jace', 'Easton', 'Justin', 'Leo', 'Bentley', 'Jaxson', 'Nathaniel', 'Blake', 'Elias', 'Theodore', 'Kayden', 'Luis', 'Tristan', 'Ezra', 'Bryson', 'Juan', 'Brody', 'Vincent', 'Micah', 'Miles', 'Santiago', 'Cole', 'Ryder', 'Carlos', 'Damian', 'Leonardo', 'Roman', 'Max', 'Sawyer', 'Jesus', 'Diego', 'Greyson', 'Alex', 'Maxwell', 'Axel', 'Eric', 'Wesley', 'Declan', 'Giovanni', 'Ezekiel', 'Braxton', 'Ashton', 'Ivan', 'Hayden', 'Camden', 'Silas', 'Bryce', 'Weston', 'Harrison', 'Jameson', 'George', 'Antonio', 'Timothy', 'Kaiden', 'Jonah', 'Everett', 'Miguel', 'Steven', 'Richard', 'Emmett', 'Victor', 'Kaleb', 'Kai', 'Maverick', 'Joel', 'Bryan', 'Maddox', 'Kingston', 'Aidan', 'Patrick', 'Edward', 'Emmanuel', 'Jude', 'Alejandro', 'Preston', 'Luca', 'Bennett', 'Jesse', 'Colin', 'Jaden', 'Malachi', 'Kaden', 'Jayce', 'Alan', 'Kyle', 'Marcus', 'Brian', 'Ryker', 'Grant', 'Jeremy', 'Abel', 'Riley', 'Calvin', 'Brantley', 'Caden', 'Oscar', 'Abraham', 'Brady', 'Sean', 'Jake', 'Tucker', 'Nicolas', 'Mark', 'Amir', 'Avery', 'King', 'Gael', 'Kenneth', 'Bradley', 'Cayden', 'Xander', 'Graham', 'Rowan']

Now we can use Python to ask interesting question of our data

Here's a program that lets the user check how popular a name was.

In [11]:
name_to_search = input("Enter a name: ")
if name_to_search in male_names:
    position = male_names.index(name_to_search)
    print(name_to_search,"was the number",(position+1),"most popular male name in the 2010s.")
else:
    print(name_to_search,"was not a popular name in the 2010s.")
Enter a name: Ryker
Ryker was the number 175 most popular male name in the 2010s.