Searching, Removing From, and Balancing Binary Search Trees
CS 66: Introduction to Computer Science II
A post-class survey on your thoughts on ChatGPT
Please fill this out
https://forms.gle/E6FKsKtvDkw1QeQt7
Reminder - your name will be removed before we look at your answers
References for this lecture
Problem Solving with Algorithms and Data Structures using Python
Sections 7.11 - 7.14: https://runestone.academy/runestone/books/published/pythonds/Trees/toctree.html
Review: Binary Search Tree Nodes
BST Nodes have at most 2 children, called left child and right child
Nodes in left subtree have value less than this node
Nodes in right subtree have value more than this node
This is true for every node!
Binary Search Tree Code from last time
Here’s the code we came up with in class last time
class BSTNode:
def __init__(self,initdata):
self.data = initdata
self.left = None
self.right = None
def __repr__(self):
return repr(self.data)+"("+repr(self.left)+","+repr(self.right)+")"
def __str__(self):
str_rep = ""
if self.left != None:
str_rep += " "+str(self.left)
str_rep += " "+str(self.data)
if self.right != None:
str_rep += " "+str(self.right)
return str_rep
def getData(self):
return self.data
def setData(self,newdata):
self.data = newdata
def getLeft(self):
return self.left
def setLeft(self,newleft):
self.left = newleft
def getRight(self):
return self.right
def setRight(self,newright):
self.right = newright
#put this method into BSTNode class
def add(self,new_val):
#implement the code here
if new_val < self.data:
if self.left == None:
new_node = BSTNode(new_val)
self.setLeft( new_node )
else:
self.getLeft().add(new_val)
elif new_val > self.data:
if self.right == None:
new_node = BSTNode(new_val)
self.setRight( new_node )
else:
self.getRight().add(new_val)
else:
raise Exception("trying to add duplicate "+str(new_val))
class BSTSet:
def __init__(self):
self.__root = None #start with an empty tree!
def add(self,new_val):
if self.__root == None:
self.__root = BSTNode(new_val)
else:
self.__root.add(new_val)
def __repr__(self):
return repr(self.__root)
def __str__(self):
return str(self.__root)
myset = BSTSet()
myset.add(4)
myset.add(7)
myset.add(2)
myset.add(9)
print(repr(myset))
print(str(myset))
4(2(None,None),7(None,9(None,None)))
2 4 7 9
Group Activity Problem 1
Implement a search method in the BSTNode class - again use recursion.
Write the associated method in BSTSet (suggestion: call it __contains__ in BSTSet so that you can use it with the in operator.
myset = BSTSet()
myset.add(4)
myset.add(7)
myset.add(2)
myset.add(9)
print( 7 in myset ) #should be True
print( 1 in myset ) #should be False
Group Activity Problem 2
What would the tree look like in a worst-case search? So then what is the computational complexity of your search? What about add?
After you a good guess, check out https://runestone.academy/ns/books/published/pythonds/Trees/SearchTreeAnalysis.html to see if you agree with the book.
Removing Items
Removing nodes is the most difficult part.
Three cases to consider:
- The node has no children
- set the reference to it from the parent to None
- The node has 1 child
- set the reference from the parent to the 1 child
- The node has 2 children
- replacee this node’s value with that of its successor
- delete the node that the successor value came from
Other Considerations
- the code will look different depending on whether the node being removed is itself a left child, a right child, or the root
- we haven’t been keeping track of the number of items - how could wee do that
- we might be able to make a BST to implement an ordered list ADT (defind here: https://runestone.academy/ns/books/published/pythonds/BasicDS/TheOrderedListAbstractDataType.html ).
- The main difficulty is implementing an index method - can you find the number of items in the tree that are less than a given item?
Unbalanced Binary Search Trees
In the worst case, a BST could be completely skewed in one direction
All operations: $O(n)$
Measuring Balance at Each Node
$balanceFactor = height(leftSubTree) - height(rightSubTree)$
AVL Trees
Inventors: G.M. Adelson-Velskii and E.M. Landis
Idea: balance factor is only allowed to be 0, 1, or -1
Approach: Whenever you add or remove items, if the balance factor is > 1 or < -1, perform rotations to get it back in balance (check all nodes back up to the root)
Result: The tree will bee guaranteed to have a height of at most $1.44*\log_2 N$
add, search, and remove will run in $O(\log N)$ time
A simple left rotation
A more complicated right rotation
In this example we
- promote C in place of E
- move C’s right child (D) and make it into E’s left child
- make E the new right child of C
Almost there…
balanceFactor 2 -> right rotation
balanceFactor -2 -> left rotation
Problem:
After left rotation:
Solution
$balanceFactor = -2$ and right child’s $balanceFactor=1$, do right rotation on right child then left rotation at this node
$balanceFactor = 2$ and left child’s $balanceFactor=-1$, do left rotation on left child then right rotation at this node
Summary of Different Set Implementations
Which data structure should you use to implement a Set?
Note: Maps are similar https://runestone.academy/ns/books/published/pythonds/Trees/SummaryofMapADTImplementations.html
Operation |
Sorted List |
Hash Table* |
Binary Search Tree |
AVL Tree |
|---|---|---|---|---|
add |
\(O(n)\) |
\(O(1)\) |
\(O(n)\) |
\(O(\log_2{n})\) |
in (search) |
\(O(\log_2{n})\) |
\(O(1)\) |
\(O(n)\) |
\(O(\log_2{n})\) |
remove |
\(O(n)\) |
\(O(1)\) |
\(O(n)\) |
\(O(\log_2{n})\) |
- Hash Table is actually $O(1)$ in the average case, but this is usually achievable with a good hash function and low load factor
Is it ever better to use an AVL Tree instead of a Hash Table as the underlying data structure? Are there memory differences? What happens if you are adding and removing items a lot? What happens when the hash table fills up?