This lab is a little different than the other labs we've done so far. The purpose of this lab is to get some experience dividing up a larger problem into several smaller ones that can each be isolated as a separate function. We'll also continue to add new functions with new program features in the next lab (Lab 13). We'll wait to turn everything in until we've completed both Lab 12 and Lab 13.
The program you are going to write solves an important proble in the field of artificial intelligence and machine learning - you will create a sentiment analyzer.
Sentiment Analysis is the problem of automatically detecting the general attitude behind a piece of written text. For example, does the text have a positive or negative attitude? Is it sarcastic? Is it a comment or a question? Is the mood happy, sad, angry, frustrated, etc.? A sentiment analyzer can help an organization sift through comments users have left on their website comment form, help flag Amazon product reviews that need a response from customer service, or help political campaigns get a feeling for what kinds of Tweets people are making on a candidate or issue.
Our sentiment analyzer is going to automatically determine whether a movie review has a positive or negative sentiment. The effect is something like this:
Our sentiment analyzer will work by first determining the sentiment of each word in the review based on how it is used in sample movie reviews which have already been scored on range of 0 to 4 (with 0 being the most negative, 4 being the most positive, and 2 being neutral). The good news is that you already did this in the challenge exercise for Lab 10: CSV Files and Text Analysis using the Rotten Tomatoes movie reviews data.
Once we have a sentiment score for each word, the sentiment for an entire review will simply be the average of the word sentiment scores. If the score for the whole review is 2 or greater, we'll call that a positive review; and if it is less than 2, it will be considered negative.
To get started, create a new Python file called sentiment_analysis.py
for this project.
For Lab 10, you wrote a program that asked the user for a word and then used that word to print out the average number of stars for reviews containing that word. Start by breaking this code up into two functions: get_2D_list_from_csv
and word_score
.
A word's sentiment score (the average number of stars for reviews containing that word) should be returned by the word_score
function. To turn this into a function, we'll take out the input
and print
statements, and instead create a function which takes a word (a string) and the 2D list from the movie_reviews.csv
file as an argument. It then returns the score you calculated (a float).
The get_2D_list_from_csv
function has been completed for you below, and the word_score
function has been started.
def get_2D_list_from_csv(filename):
"""
Reads a CSV file into a 2D list and returns it
Parameter:
filename: a string, the name of a csv file
Returns: a 2D list representing the contents of the file
"""
with open(filename) as fileobject:
csvreaderobject = csv.reader(fileobject)
list_from_csv = list(csvreaderobject)
return list_from_csv
def word_score(word,reviews_2d_list):
"""
Calculates the sentiment score of a word on a scale of 0 to 4 with
0 being very negative and 4 being very positive.
Parameters:
word: a string, the word to be scored
reviews_2d_list: a 2d list of sample reviews that have been given sentiment scores
the first column is the sentiment score of 0 to 4
the second column is the text of the review
Returns:
a float representing word's sentiment
if the word does not appear in the sample reviews, a negative number is returned to
indicate that the word cannot be scored
"""
Note that in the docstring it says it will return a negative number if the word isn't present in any of the sample reviews - that's something you'll need to check for.
When you have these two functions working, you should be able to test them in the interactive shell like this:
Next, write a function called text_score()
with one parameter - the text of a new review that needs to be scored. The function definition should start like this:
def text_score(text):
"""
Calculates the sentiment score of a phrase/sentence/paragraph
which is made up of many words
Parameters:
text: a string, the text of a new review we want to find the sentiment of
Returns: a float representing the text's sentiment
"""
reviews = get_2D_list_from_csv("movie_reviews.csv") #step 1
#complete steps 2-4 here
This function will need to do the following things:
movie_reviews.csv
file into a 2D list (actually, I'd suggest making this a separate function, but you can call it in here)text
) into a list so you can treat each word separatelyword_score()
on the wordFor step 2, remember that you can use the string's split
method to break apart a string into a list of words like this:
example_text = "it brought tears to my eyes"
example_text_list = example_text.split(" ")
print(example_text_list)
You may be thinking "Why are we loading the movie_reviews.csv
file here instead of in word_score()
where we need that data?" Indeed, that's a way you could do it. However, if you open the file and read all the data from it once for each word, that's a lot of work you're re-doing over and over again. Reading from files is a comparatively slow thing for computers to do, so if we need that same list many times, it's better to read it in once and then just pass it as an argument to any function that needs it.
You should perform some unit tests on your text_score
function. Here are some examples of statements you could add to the end of your .py
file that would do run it on some example reviews.
Create a function that interacts with the user, allowing them to type in a phrase and then telling them the sentiment of what they typed. This is a good place to put all of your input
and print
statements. Here is an example of what this could look like, though feel free to get creative in how you present it:
Make sure that you include a call to your user-interaction function at the end of your .py
file so that it properly launches the file.
Congrats! You've created your first program that uses functions to divide a big problem into several smaller ones.