Lab 12: Sentiment Analysis Program¶

This lab is a little different than the other labs we've done so far. The purpose of this lab is to get some experience dividing up a larger problem into several smaller ones that can each be isolated as a separate function. We'll also continue to add new functions with new program features in the next lab (Lab 13). We'll wait to turn everything in until we've completed both Lab 12 and Lab 13.

The program you are going to write solves an important proble in the field of artificial intelligence and machine learning - you will create a sentiment analyzer.

Sentiment Analysis is the problem of automatically detecting the general attitude behind a piece of written text. For example, does the text have a positive or negative attitude? Is it sarcastic? Is it a comment or a question? Is the mood happy, sad, angry, frustrated, etc.? A sentiment analyzer can help an organization sift through comments users have left on their website comment form, help flag Amazon product reviews that need a response from customer service, or help political campaigns get a feeling for what kinds of Tweets people are making on a candidate or issue.

Our sentiment analyzer is going to automatically determine whether a movie review has a positive or negative sentiment. The effect is something like this:

Our sentiment analyzer will work by first determining the sentiment of each word in the review based on how it is used in sample movie reviews which have already been scored on range of 0 to 4 (with 0 being the most negative, 4 being the most positive, and 2 being neutral). The good news is that you already did this in the challenge exercise for Lab 10: CSV Files and Text Analysis using the Rotten Tomatoes movie reviews data.

Once we have a sentiment score for each word, the sentiment for an entire review will simply be the average of the word sentiment scores. If the score for the whole review is 2 or greater, we'll call that a positive review; and if it is less than 2, it will be considered negative.

To get started, create a new Python file called sentiment_analysis.py for this project.

Exercise 1: Splitting up our previous code into two functions¶

For Lab 10, you wrote a program that asked the user for a word and then used that word to print out the average number of stars for reviews containing that word. Start by breaking this code up into two functions: get_2D_list_from_csv and word_score.

A word's sentiment score (the average number of stars for reviews containing that word) should be returned by the word_score function. To turn this into a function, we'll take out the input and print statements, and instead create a function which takes a word (a string) and the 2D list from the movie_reviews.csv file as an argument. It then returns the score you calculated (a float).

The get_2D_list_from_csv function has been completed for you below, and the word_score function has been started.

def get_2D_list_from_csv(filename):
    """
    Reads a CSV file into a 2D list and returns it
    
    Parameter:
        filename: a string, the name of a csv file
        
    Returns: a 2D list representing the contents of the file
    """
    with open(filename) as fileobject:
        csvreaderobject = csv.reader(fileobject)
        list_from_csv = list(csvreaderobject)
    return list_from_csv

def word_score(word,reviews_2d_list):
    """
    Calculates the sentiment score of a word on a scale of 0 to 4 with
    0 being very negative and 4 being very positive.
    
    Parameters:
        word: a string, the word to be scored
        reviews_2d_list: a 2d list of sample reviews that have been given sentiment scores
                the first column is the sentiment score of 0 to 4
                the second column is the text of the review
    Returns:
        a float representing word's sentiment
        if the word does not appear in the sample reviews, a negative number is returned to
        indicate that the word cannot be scored
    """
    
    

Note that in the docstring it says it will return a negative number if the word isn't present in any of the sample reviews - that's something you'll need to check for.

When you have these two functions working, you should be able to test them in the interactive shell like this:

Exercise 2: Scoring a whole review¶

Next, write a function called text_score() with one parameter - the text of a new review that needs to be scored. The function definition should start like this:

def text_score(text):
    """
    Calculates the sentiment score of a phrase/sentence/paragraph
    which is made up of many words
    
    Parameters:
        text: a string, the text of a new review we want to find the sentiment of
        
    Returns: a float representing the text's sentiment
    """
    
    reviews = get_2D_list_from_csv("movie_reviews.csv") #step 1
    #complete steps 2-4 here

This function will need to do the following things:

  1. Load the movie_reviews.csv file into a 2D list (actually, I'd suggest making this a separate function, but you can call it in here)
  2. Split the new review text (i.e., the parameter text) into a list so you can treat each word separately
  3. Loop through this list, doing the following with each word:
    • call word_score() on the word
    • check if it actually scored it or returned a negative value
    • if it did score it, add its score on to an accumulator variable, and add 1 to a counter to keep track of how many words were scored
  4. After the loop finishes, use your accumulator and counter to compute the average score for all the words from the new review text list and return it (be careful - if you didn't score any words, this could end up in a divide-by-zero error, so check for that edge case - return a negative number if you couldn't)

For step 2, remember that you can use the string's split method to break apart a string into a list of words like this:

example_text = "it brought tears to my eyes"
example_text_list = example_text.split(" ")
print(example_text_list)
['it', 'brought', 'tears', 'to', 'my', 'eyes']

You may be thinking "Why are we loading the movie_reviews.csv file here instead of in word_score() where we need that data?" Indeed, that's a way you could do it. However, if you open the file and read all the data from it once for each word, that's a lot of work you're re-doing over and over again. Reading from files is a comparatively slow thing for computers to do, so if we need that same list many times, it's better to read it in once and then just pass it as an argument to any function that needs it.

You should perform some unit tests on your text_score function. Here are some examples of statements you could add to the end of your .py file that would do run it on some example reviews.

Exercise 3: Write a function for user interaction¶

Create a function that interacts with the user, allowing them to type in a phrase and then telling them the sentiment of what they typed. This is a good place to put all of your input and print statements. Here is an example of what this could look like, though feel free to get creative in how you present it:

Make sure that you include a call to your user-interaction function at the end of your .py file so that it properly launches the file.

Congrats! You've created your first program that uses functions to divide a big problem into several smaller ones.