Problem Solving with Algorithms and Data Structures using Python
Sections 6.1-6.5 https://runestone.academy/ns/books/published/pythonds/SortSearch/toctree.html
As long as you can iterate through items in a data structure, you can write a sequential search or linear search - a search that runs in $O(n)$ time.
We've been doing this all semester long: 1st week activity, search in OrderedList class, etc.
linear search of Python list
15 in [3,5,2,4,1]
False
def sequentialSearch(alist, item):
pos = 0
found = False
while pos < len(alist) and not found:
if alist[pos] == item:
found = True
else:
pos = pos+1
return found
testlist = [1, 2, 32, 8, 17, 19, 42, 13, 0]
print(sequentialSearch(testlist, 3))
print(sequentialSearch(testlist, 13))
False True
If the item is actually in the list
If the item is not actually in the list
If we knew that the list was in sorted order is it possible to make the sequential search end early? How?
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42]
print(sequentialSearch(testlist, 3))
print(sequentialSearch(testlist, 13))
False True
Binary Search works by looking at the middle item within a given range and throwing out at least half the items on each iteration.
only works on a sorted data structure
Let's draw some pictures on the board to see how this works.
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
testlist = [0, 1, 2, 8, 13, 17, 19, 32, 42,]
print(binarySearch(testlist, 3))
print(binarySearch(testlist, 13))
False True
Write down how many items you have to look at in the worst case for lists of the following size:
Things to notice
Binary Search is $O(\log n)$ - inverse of exponential growth
This is fantastic!
The book implemented the Ordered List Abstract Data Type as a Linked List: https://runestone.academy/ns/books/published/pythonds/BasicDS/ImplementinganOrderedList.html
If we instead implement Ordered List as an Array (Python list)
add and remove still $O(n)$ because items are shifted (but finding the spot is a searchmy_collection = [None,1,None,None,None,5,6,7,None,9,10,None,None,13,None,15,None,None,18,None]
Now, we can search if a value is in the collection by looking it up at its index. If it is not in the collection, None will be returned.
print( my_collection[5] )
print( my_collection[18] )
print( my_collection[4] )
print( my_collection[17] )
5 18 None None
Because list access is $O(1)$, this does search in $O(1)$. So, are the downsides to this approach?
Could we make an UnorderedList this way? How about an OrderedList? How about a Set?
Hash tables work like the above approach, except it allows for values outside of indices of the table using a hash function.
Hash functions can be any function that transforms a value into a suitable index for the list.
For example:
hash_table_size = 10
hash_table = [None] * 10 #initialize all 10 spots to None
def simple_hash(value,num_items):
return value % num_items
print(hash_table)
[None, None, None, None, None, None, None, None, None, None]
Now let's put a value into the hash table.
val = 87
val_hash = simple_hash(val,hash_table_size)
hash_table[val_hash] = val
print(hash_table)
[None, None, None, None, None, None, None, 87, None, None]
Doing a few more:
hash_table[simple_hash(33,hash_table_size)] = 33
hash_table[simple_hash(112,hash_table_size)] = 112
hash_table[simple_hash(19,hash_table_size)] = 19
print(hash_table)
[None, None, 112, 33, None, None, None, 87, None, 19]
Now we can search the hash table like this:
hash_table[simple_hash(33,hash_table_size)] == 33 #is 33 at the spot is should be in the hash table?
True
hash_table[simple_hash(12,hash_table_size)] == 12 #is 12 at the spot is should be in the hash table?
False
There's still a problem with this approach. What is it?
When two different items end up with the same hash value, it is called a collision.
This is a problem since both items can't occupy the same memory location.
Collision Resolution is the process of finding a new location for an item when there is a collision - we'll talk about some approaches next time.
If you want to put strings in your hash table, you have to come up with a way to convert them into numbers.
One approach is to use the ord function for each character, which returns the unicode value used to represent the character in memory.
ord('c')
99
ord('a')
97
ord('t')
116
def string_hash(astring, tablesize):
sum = 0
for pos in range(len(astring)):
sum = sum + ord(astring[pos])
return sum%tablesize
print( string_hash("cat",10) )
print( string_hash("dog",10) )
print( string_hash("chicken",10) )
2 4 5