Searching, Removing From, and Balancing Binary Search Trees

4 minute read

CS 66: Introduction to Computer Science II

A post-class survey on your thoughts on ChatGPT

Please fill this out

https://forms.gle/E6FKsKtvDkw1QeQt7

Reminder - your name will be removed before we look at your answers

References for this lecture

Problem Solving with Algorithms and Data Structures using Python

Sections 7.11 - 7.14: https://runestone.academy/runestone/books/published/pythonds/Trees/toctree.html

Review: Binary Search Tree Nodes

BST Nodes have at most 2 children, called left child and right child

Nodes in left subtree have value less than this node

Nodes in right subtree have value more than this node

This is true for every node!

Binary Search Tree Code from last time

Here’s the code we came up with in class last time

class BSTNode:
    def __init__(self,initdata):
        self.data = initdata
        self.left = None
        self.right = None

        
    def __repr__(self):
        return repr(self.data)+"("+repr(self.left)+","+repr(self.right)+")"
        
        
    def __str__(self):
        str_rep = ""
        if self.left != None:
            str_rep += " "+str(self.left)
        str_rep += " "+str(self.data)
        if self.right != None:
            str_rep += " "+str(self.right)
        return str_rep
            
        
    def getData(self):
        return self.data

    def setData(self,newdata):
        self.data = newdata
        
    def getLeft(self):
        return self.left

    def setLeft(self,newleft):
        self.left = newleft
        
    def getRight(self):
        return self.right

    def setRight(self,newright):
        self.right = newright
        
    #put this method into BSTNode class
    def add(self,new_val):
        #implement the code here
        
        if new_val < self.data:
            if self.left == None:
                new_node = BSTNode(new_val)
                self.setLeft( new_node  )
            else:
                self.getLeft().add(new_val)
        elif new_val > self.data:
            if self.right == None:
                new_node = BSTNode(new_val)
                self.setRight( new_node )
            else:
                self.getRight().add(new_val)
        else:
            raise Exception("trying to add duplicate "+str(new_val))
                
        
        
class BSTSet:
    def __init__(self):
        self.__root = None #start with an empty tree!
        
    def add(self,new_val):
        if self.__root == None:
            self.__root = BSTNode(new_val)
        else:
            self.__root.add(new_val)
    
    def __repr__(self):
        return repr(self.__root)
    
    def __str__(self):
        return str(self.__root)
            
myset = BSTSet()
myset.add(4)
myset.add(7)
myset.add(2)
myset.add(9)
print(repr(myset))
print(str(myset))
4(2(None,None),7(None,9(None,None)))
  2 4  7  9

Group Activity Problem 1

Implement a search method in the BSTNode class - again use recursion.

Write the associated method in BSTSet (suggestion: call it __contains__ in BSTSet so that you can use it with the in operator.

myset = BSTSet()
myset.add(4)
myset.add(7)
myset.add(2)
myset.add(9)

print( 7 in myset ) #should be True
print( 1 in myset ) #should be False

Group Activity Problem 2

What would the tree look like in a worst-case search? So then what is the computational complexity of your search? What about add?

After you a good guess, check out https://runestone.academy/ns/books/published/pythonds/Trees/SearchTreeAnalysis.html to see if you agree with the book.

Removing Items

Removing nodes is the most difficult part.

Three cases to consider:

  1. The node has no children
    • set the reference to it from the parent to None
  1. The node has 1 child
    • set the reference from the parent to the 1 child
  1. The node has 2 children
    • replacee this node’s value with that of its successor
    • delete the node that the successor value came from

Other Considerations

  • the code will look different depending on whether the node being removed is itself a left child, a right child, or the root
  • we haven’t been keeping track of the number of items - how could wee do that
  • we might be able to make a BST to implement an ordered list ADT (defind here: https://runestone.academy/ns/books/published/pythonds/BasicDS/TheOrderedListAbstractDataType.html ).
    • The main difficulty is implementing an index method - can you find the number of items in the tree that are less than a given item?

Unbalanced Binary Search Trees

In the worst case, a BST could be completely skewed in one direction

All operations: $O(n)$

Measuring Balance at Each Node

$balanceFactor = height(leftSubTree) - height(rightSubTree)$

AVL Trees

Inventors: G.M. Adelson-Velskii and E.M. Landis

Idea: balance factor is only allowed to be 0, 1, or -1

Approach: Whenever you add or remove items, if the balance factor is > 1 or < -1, perform rotations to get it back in balance (check all nodes back up to the root)

Result: The tree will bee guaranteed to have a height of at most $1.44*\log_2 N$

add, search, and remove will run in $O(\log N)$ time

A simple left rotation

A more complicated right rotation

In this example we

  1. promote C in place of E
  2. move C’s right child (D) and make it into E’s left child
  3. make E the new right child of C

Almost there…

balanceFactor 2 -> right rotation

balanceFactor -2 -> left rotation

Problem:

After left rotation:

Solution

$balanceFactor = -2$ and right child’s $balanceFactor=1$, do right rotation on right child then left rotation at this node

$balanceFactor = 2$ and left child’s $balanceFactor=-1$, do left rotation on left child then right rotation at this node

Summary of Different Set Implementations

Which data structure should you use to implement a Set?

Note: Maps are similar https://runestone.academy/ns/books/published/pythonds/Trees/SummaryofMapADTImplementations.html

Operation

Sorted List

Hash Table*

Binary Search Tree

AVL Tree

add

\(O(n)\)

\(O(1)\)

\(O(n)\)

\(O(\log_2{n})\)

in (search)

\(O(\log_2{n})\)

\(O(1)\)

\(O(n)\)

\(O(\log_2{n})\)

remove

\(O(n)\)

\(O(1)\)

\(O(n)\)

\(O(\log_2{n})\)

  • Hash Table is actually $O(1)$ in the average case, but this is usually achievable with a good hash function and low load factor

Is it ever better to use an AVL Tree instead of a Hash Table as the underlying data structure? Are there memory differences? What happens if you are adding and removing items a lot? What happens when the hash table fills up?