Learn about heap data structure and how it is used for sorting
Introduction
Heap is a data structure that represents an array in a binary tree-based format. Heap imposes the following rules for its structure:
Completeness. Every level of the heap is completely filled with elements. However, the last level may be partially filled with elements starting from the left.Heap rule. The value of any parent’s node must be less or equal to the values of its children. If this property is satisfied, the heap is called min-heap. There also exists a max-heap variation for which parents’ values must be greater than children’s values.
Examples and code in this article will be provided for min heaps. The algorithm workflow for max heaps looks very similar. An example of a min-heap is shown below.
Heap is usually stored in the form of an array. If a parent has an index i, then the positions of its left and right children are 2 * i + 1 and 2 * i + 2 respectively. Inversely, if a non-root node has an index i, then its parent’s node has an index (i – 1) // 2. Following this principle, we get the array representation of the heap above:
Operations
Heap supports several operations:
Inserting a nodeBuilding a heap from an arrayExtracting a node with the minimum valueSorting
Since the heap data structure has several operations, it is more practical to implement it as a class. For now, we are going to implement its basis. After each operation, a corresponding code snippet will be provided.
https://medium.com/media/98484c83d1115b010e50dbd1f62b0d54/hrefheap field stores an input array in the form of a heap (will be implemented later)The _swap() method takes two indices of the array and swaps their values.The _number_of_children() method returns the number of children a node has (0, 1 or 2)
Inserting a node
The new element of the heap is inserted at the last position. Such insertion can break the heap rule if the new element is lower than the parent’s value. To avoid this problem, the new node is recursively propagated up until the moment when it does not violate the heap rule. The described procedure is called heapify(up).
From the figure above we insert a node with the value 3.
After the insertion, the heap rule is broken, since 3 < 15 (parent). We swap elements 3 and 15.Now node 3 has a new parent which has value 7. Again, the heap rule is not satisfied, since 3 < 7. As a result, we swap 3 and 7.Node 3 stands at index 2 and has a parent with the value 1. Since 3 ≥ 1, the heap rule is correct. At this stage, the insertion procedure ends.
Let us find out the time complexity for insertion. The worst case is possible when it is necessary to propagate the new node from the bottom to the top level of a tree. Since the height of any tree is logarithmic in relation to the total number of its elements N and each comparison takes O(1) time, the final estimation results in O(logN) time.
https://medium.com/media/e62f485f6eac0123f5115da6db7ca6f2/hrefThe insert() method appends the value to the heap and then calls the method to heapify it up.The _heapify_up() method recursively invokes itself on a parent node until the heap rule is correct.
Building a heap
For each element of an input array, an insert procedure is called. This is the way a heap is built.
Speaking of complexity, it might seem that building a heap requires O(N * logN) of time because for each of N elements, we call a function that works by O(logN) time. However, it is possible to improve that estimation and prove mathematically that the total time is O(N).
https://medium.com/media/a572e8f398065be2ef9c68d0d7b26fe2/hrefFor the passed array in the build() method, the heap is built through insertion calls.
Extracting a node with the minimum value
The minimum node is located at the top of the heap. We extract the minimum and replace the top node with the last node of the heap. The heap rule is violated, so we propagate this element down. The algorithm is very similar to the previous one we used above (during insertion elements were propagated up): at each step, we swap the current element with a child who has the minimum value. This procedure lasts until the heap rule is no longer broken or the current element does have a child.
In the figure above, the node with value 1 was extracted and the last node with value 15 took its place.
Since node 15 violates the heap rule, we swap it with its smallest child which is 3.Then node 15 has children 7 and 8 which are inferior. Again, we swap 15 with the smallest child which is 7.After that, 15 stands at index 5 and has an only child which is 20. Since 15 ≤ 20, we stop the heapify procedure.
Similar to the heapify algorithm in the insertion section, this one has the same asymptotics and proceeds for O(logN) time.
https://medium.com/media/f00902c077a92c7ef5e06a077090cb57/href
Sorting
Sorting is implemented via minimum node extraction. While the heap is not empty, we call the extract_min() function and append every minimum element to a new array. This way the array will consist of sorted elements.
Since the heap contains N nodes and extract_min() works for O(logN), the total sorting time sorting takes O(N * logN).
https://medium.com/media/d0206dfcb46734dd0d8a8fb4dc276843/href
Conclusion
We have covered all four main operations for heaps. To sort an array using a heap data structure, it is necessary to first build a heap and then call the sort method on it. Building a heap requires O(N) time and sorting takes O(N * logN) time which finally results in O(N * logN) asymptotics for heap sort.
The full implementation of the Heap class is shown below.
https://medium.com/media/0298a0ab532c9e27dc3b0859e27aaa6e/href
All images unless otherwise noted are by the author
Overview of Sorting Algorithms: Heap Sort was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.