Insertion Sorting Algorithm

CS 65: Introduction to Computer Science I

Sorting

Arranging a collection into some sorted order (ascending/descending, numerical/alphabetical/lexicographical) is a very common problem in computer science

  • Can make output more human-readable
  • It can make searching (including find max/min) much faster
In [1]:
student_grades = [88.3, 53.2, 76.6, 92.2, 81.9, 79.7, 62.1, 84.8, 83.3]
student_grades.sort()
print(student_grades)
[53.2, 62.1, 76.6, 79.7, 81.9, 83.3, 84.8, 88.3, 92.2]

Sorting Algorithms

Well-studied algorithmic problem

Many different algorithms (bubble sort, quicksort, merge sort, heap sort, selection sort, insertion sort, bucket sort, etc.)

  • performance depends on characteristics of the data
  • work with different data structures
  • usually, you want the one that is fastest for your data/application/problem

Later computer science courses cover how to analyze and compare these algorithms.

Insertion Sort

Insertion Sort: is a simple, intuitive sorting algorithm

  • many people use this algorithm when sorting a pile of papers or a hand of playing cards
  • it's not the fastest algorithm - but it's not the worst, either

Works by splitting the list into two parts - a sorted part and an unsorted part

  • Move items one at a time into the sorted part, putting it in the right place

InsertionSortAnimation.gif

In [ ]:
 
In [2]:
student_grades = [88.3, 53.2, 76.6, 92.2, 81.9, 79.7, 62.1, 84.8, 83.3]


#loop through all indices in the list, starting with index 1 (first element is already "sorted")
for current_grade_index in range(1,len(student_grades)):

    print("popping from index",current_grade_index,", length of list:",len(student_grades))
    
    #pop item at current index
    curr_grade = student_grades.pop(current_grade_index)
    print("item popped",curr_grade)
    
    
    #loop from the beginning of the list up until the index we popped from
    position_to_check = 0
    
    while position_to_check < current_grade_index:
        
        #compare popped item with each
        #keep going until we run into one that our item isn't greater than
        #insert the item at that spot
        
        if student_grades[position_to_check] > curr_grade:
            student_grades.insert(position_to_check, curr_grade)
            break
            
        position_to_check += 1
        
    # if we got back to the current spot without finding a place
    # to put this item, then put it back in its original spot
    if position_to_check == current_grade_index:
        student_grades.insert(position_to_check, curr_grade)
    
        

print(student_grades)
popping from index 1 , length of list: 9
item popped 53.2
popping from index 2 , length of list: 9
item popped 76.6
popping from index 3 , length of list: 9
item popped 92.2
popping from index 4 , length of list: 9
item popped 81.9
popping from index 5 , length of list: 9
item popped 79.7
popping from index 6 , length of list: 9
item popped 62.1
popping from index 7 , length of list: 9
item popped 84.8
popping from index 8 , length of list: 9
item popped 83.3
[53.2, 62.1, 76.6, 79.7, 81.9, 83.3, 84.8, 88.3, 92.2]

Another approach

Because both pop and insert require Python to scan through the list, we end up doing more iterating through the list than we need to, which takes extra time.

Here's another approach that eliminates the need to use pop and insert by scanning from right-to-left starting with the item being inserted into the correct place. As we scan, we move the items over one spot at a time until we find the place where this one goes.

In [3]:
student_grades = [88.3, 53.2, 76.6, 92.2, 81.9, 79.7, 62.1, 84.8, 83.3]

for current_grade_index in range(1,len(student_grades)):

    current_grade = student_grades[current_grade_index]
    position = current_grade_index

    #shift grades down the list until we find where this one goes
    while position>0 and student_grades[position-1]>current_grade:
        student_grades[position]=student_grades[position-1]
        position = position-1

    student_grades[position]=current_grade
    
print(student_grades)
[53.2, 62.1, 76.6, 79.7, 81.9, 83.3, 84.8, 88.3, 92.2]

Group Exercise

Let's improve this code. Do the following:

  1. rename the variables so that it could work with any list and make sense (e.g., what if I wanted to sort zip codes instead of grades?)
  2. write the code as a function that lets you pass the list as an argument and returns the sorted list (it's ok if the argument list is changed, but it should also be returned)
In [ ]: