Analyzing Python List Operations¶

CS 66: Introduction to Computer Science II¶

References for this lecture¶

Problem Solving with Algorithms and Data Structures using Python

Sections 3.6: https://runestone.academy/ns/books/published/pythonds/AlgorithmAnalysis/Lists.html

Python Documentation on Time Complexity: https://wiki.python.org/moin/TimeComplexity

Section 1.13: https://runestone.academy/ns/books/published/pythonds/Introduction/ObjectOrientedProgramminginPythonDefiningClasses.html

Section 2.1: https://runestone.academy/ns/books/published/pythonds/ProperClasses/a_proper_python_class.html

Review: Anagram Detection Activity¶

An anagram is a word or phrase that can be formed by rearranging the letters of a different word or phrase.

Examples of anagrams include

silent, listen
night, thing
the morse code, here come dots
eleven plus two, twelve plus one

Problem: Write a function that will tell you if two strings are anagrams.

The book provides four different solutions in Section 3.4. Three of them are reproduced below.

Each group will be assigned one of these solutions. Do the following as a group.

Test the code on several inputs of different sizes.
Instrument the code to measure the time it takes on different-sized inputs.
Give examples of best, worst, and average case inputs.
Determine what the Big-O of the algorithm is, and be ready to explain why.

If you have time, you can check out what it says in the book, but try to analyze it without looking first!

Solution 1: Checking off¶

This code works by converting the second string into a list and then search through the list for each character from the first string and replacing it with None when found.

For example, if given "silent" and "listen", the list would start out as

['l','i','s','t','e','n']

when searching for 's', it becomes ['l','i',None,'t','e','n']

when searching for 'i', it becomes ['l',None,None,'t','e','n']

... and so until the list becomes [None,None,None,None,None,None]

In [1]:

def anagramSolution1(s1,s2):
    stillOK = True
    if len(s1) != len(s2):
        stillOK = False

    alist = list(s2)
    pos1 = 0

    while pos1 < len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2 < len(alist) and not found:
            if s1[pos1] == alist[pos2]:
                found = True
            else:
                pos2 = pos2 + 1

        if found:
            alist[pos2] = None
        else:
            stillOK = False

        pos1 = pos1 + 1
        
        #uncomment this if you want to see what the list looks like at each step
        #print(alist)

    return stillOK

print(anagramSolution1('silent','listen'))

True

Solution 2: Sort and Compare¶

This solution starts by converting both strings to lists and then sorting them. Once in sorted order, it goes through and checks that each corresponding item in the list is the same.

For example, if given "silent" and "listen", it would turn them into lists ['s', 'i', 'l', 'e', 'n', 't'] and ['l', 'i', 's', 't', 'e', 'n'].

Then, after sorting each list, we get ['e', 'i', 'l', 'n', 's', 't'] and ['e', 'i', 'l', 'n', 's', 't'].

We then compare e to e, then i to i, then l to l and so on. If we ever find two that don't match, we know it isn't an anagram. If we get to the end and they all match, it is an anagram.

In [2]:

def anagramSolution2(s1,s2):
    alist1 = list(s1)
    alist2 = list(s2)

    alist1.sort()
    alist2.sort()
    
    #uncomment these if you want to see the sorted lists
    #print(alist1)
    #print(alist2)

    pos = 0
    matches = True

    while pos < len(s1) and matches:
        if alist1[pos]==alist2[pos]:
            pos = pos + 1
        else:
            matches = False

    return matches

print(anagramSolution2('silent','listen'))

True

Solution 4: Count and Compare¶

This solution creates a list of letter frequencies for each string. Since there aree 26 letters in the alphabet, the strings will each have 26 entries - the first entry is the number of occurrences of 'a', the secondd is the number of occurrences of 'b', and so on.

We can then loop through these frequency lists and compare them item by item to see if they're the same.

For example, given inputs 'elevenplustwo' and 'twelveplusone', you end up with the frequency lists

[0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0]

and [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0]

looping through this list entry by entry will show that they are the same.

On the other hand, if given inputs 'granma' and 'anagram', you'd get the frequency lists

[2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

[3, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

And you can determine they are not anagrams, because the first list has a 2 in the a position while the second one has a 3.

In [3]:

def anagramSolution4(s1,s2):
    c1 = [0]*26
    c2 = [0]*26

    for i in range(len(s1)):
        pos = ord(s1[i])-ord('a')
        c1[pos] = c1[pos] + 1

    for i in range(len(s2)):
        pos = ord(s2[i])-ord('a')
        c2[pos] = c2[pos] + 1
        
    #uncomment these if you want to see the word frequency lists
    #print(c1) 
    #print(c2)

    j = 0
    stillOK = True
    while j<26 and stillOK:
        if c1[j]==c2[j]:
            j = j + 1
        else:
            stillOK = False

    return stillOK

print(anagramSolution4('elevenplustwo','twelveplusone'))

True

Group Activity Problem 1¶

We're going to run some experiments to see if we can guess what the Big-O of various Python list operations is.

Get the timing_list_operations.py file.
- generates lots of different random lists of different sizes
- times the code and plots the running times
- as is, will generate 20 different lists each of sizes 100000, 200000, ... , 1000000

In [ ]:

list_sizes = range(100000,1000001,100000)
results = run_list_op_timing_experiments(list_sizes,20)
plot_results(list_sizes,results)
display_results(list_sizes,results)

you can change what code is timed by finding this part

In [ ]:

        ### BEGIN CODE BEING TIMED

        0 in test_list_copy  #testing the built-in search operator
        #x = test_list_copy[random_index]  #testing list access
        #test_list_copy.sort()
        
        ### END CODE  BEING TIMED

What can you conclude about the run time of the in operator based on what you see?

Group Activity Problem 2¶

In the above example, you tested the 0 in test_list_copy operation. Now try the following operations. For each one, try to guess the Big-O from the run time results (you may need to try some bigger $n$ to see some of these clearly).

If you see any result like 171.45672101µ, it means $171.45672101x10^{-6}$.

In [ ]:

x = test_list_copy[random_index]  #access a random item in a list - you'll have to generate a random int first
test_list_copy.sort() #sort the list
test_list_copy.pop()  #remove the last item in the list
test_list_copy.pop(0) #remove the first item in the list

Discuss: Are you surprised by any of these results?

Once you have an idea of what the Big-O is for these operations, look to see what the textbook authors say it is here: https://runestone.academy/ns/books/published/pythonds/AlgorithmAnalysis/Lists.html

Or, consult the official Python documentation here: https://wiki.python.org/moin/TimeComplexity

White Board Talk¶

If there's time, we'll draw some pictures on the white board to see how Python lists are stored in memory and how that affects the Big-O of all these operations.

Points to note:

Python keeps track of lists in consecutive chunks of computer memory - this data structure is often called an array
Consecutive memory locations allow $O(1)$ access to items by index in the list - this is often called random access
Python allocates a certain amount of space for the list. If it outgrows that, it will allocate a new, bigger memory space and copy everything over.
The worst case for append() happens when it triggers a re-allocation to the bigger memory space, so it is technically $O(n)$. But, this is guaranteed to happen infrequently enough that it's still $O(n)$ to append $n$ items, thus thee ammortized worst case for appending one item is $O(1)$, which is why the textbook says append() is $O(1)$. For more information, see https://wiki.python.org/moin/TimeComplexity

Here's a blank memory diagram in case I need to draw digitally

Defining new types¶

In Python, we use classes to create new types

A class defines two things:

Data/Attributes: what do objects of this type look like?
Methods: what can you do with objects of this type?

Example: Date object¶

Let's look at the date type

The date class is defined in the datetime module, and we can import it and use it in our code

In [5]:

import datetime

#creating a new date object
decl_ind_date = datetime.date(1776,7,4) 

#datetime.date is a type
print( type(decl_ind_date) )

print("Here's what it looks like when we print a date:",decl_ind_date)

print("Here's what the data that makes up a date looks like:")
print( decl_ind_date.month )
print( decl_ind_date.day )
print( decl_ind_date.year )

#you can call methods on dates - here's one thing you can do with a date
#weekday method returns the number of the day of the week this date fell on (0 = Monday, 6 = Sunday)
print( decl_ind_date.weekday() ) 

<class 'datetime.date'>
Here's what it looks like when we print a date: 1776-07-04
Here's what the data that makes up a date looks like:
7
4
1776
3

month, day, and year are attributes - which data values associated with the object

weekday() is a method - like a function, but you call it using dot notation on a date object

How do we write our own classes?¶

Classes allow you to encapsulate data and actions-on-that-data together into one thing - this is an abstraction technique - it's good programming.

A class defines how objects behave - it is a blueprint that can be used to create many different objects of that type

Syntax:

keyword class
a name you decide (by convention, start with uppercase letter)
a colon :
indented list of function definitions (i.e., method definitions)
- each method has a parameter called self which refers to the particular object being used at that time

In [6]:

class Motivator:
    
    def message1(self):
        print("You can do it!")
        
    def message2(self):
        print("I'm proud of you!")

In [7]:

m = Motivator()
m.message2()
print( type(m) )

I'm proud of you!
<class '__main__.Motivator'>

Notice that you always have to make self a parameter, but you don't send it as an argument in parantheses like other arguments.

self is the object (here, m) that the method was called on

Defining Classes with Attributes¶

Any attribute can be accessed in any of the class's methods using self. Each object of the class has a different set of all the attributes (just like different date objects represent different dates on the calendar)

Initialize attributes using the special __init__() method, which will be invoked whenever a new object of this type is created.

In [8]:

class PersonalMotivator:
    
    def __init__(self,n):
        self.name = n
    
    def message1(self):
        print("You can do it,",self.name)
        
    def message2(self):
        print("I'm proud of you,",self.name)

In [9]:

#creates two objects of the PersonalMotivator class
eric_motivator = PersonalMotivator("Eric")
tim_motivator = PersonalMotivator("Tim")


eric_motivator.message1()
eric_motivator.message2()
tim_motivator.message1()

You can do it, Eric
I'm proud of you, Eric
You can do it, Tim

Group Activity Problem 3:¶

Where does self.name get its value from?
When I call eric_motivator.message2(), what is self?
How would I create a third object of the PersonalMotivator class? Do I have to pass it a name?

In [ ]: