Overview of Sorting Algorithms: Merge Sort
Understand how divide-and-conquer paradigm can be applied for sorting
Introduction
When it comes to sorting, merge sort is one of the most popular algorithms. It is based on the famous “Divide-and-conquer” paradigm where a big problem is broken down into smaller problems that are easier to solve. The solutions to them are then combined for the original problem.
In this tutorial, we will dive into implementation details and estimate merge sort complexity in terms of Big O notation. For a better understanding, an example will also be provided.
Algorithm
The idea of the algorithm is to start recursively sorting smaller subarrays of the original array. When several subarrays are sorted, they are transformed into a new array of a larger size. This procedure lasts until we get the sorted version of the initial array.
It might seem too complicated at first but it turns out that getting a new sorted array based on already sorted arrays is a relatively fast procedure.
We are going to start by looking at the merge function. After that, we are going to use it for the final version of the algorithm.
Merge function
The merge function takes two sorted subarrays and returns a new sorted array consisting of elements from both of the subarrays. In the code snippet below an array of elements is passed with two sorted subarrays array[left, middle] and array[middle + 1, right].
https://medium.com/media/e9f3ff015254b8b0a851c6e6b16b6aff/href
We loop through elements of both arrays by using two pointers: i and j. By comparing array[i] and array[j] elements on each iteration, we add a lesser element to new_array. After that, a pointer i or j is increased based on the added element so that it can point to the next element.
The loop lasts until the moment when all elements of either subarray have been added to the new_array. Following that, we simply add the rest elements containing larger values of another subarray.
Finally, we copy each element of the sorted new_array into the original array. As a consequence, all elements from indices left to right are sorted in the array.
Let us notice that for two subarrays of length N we linearly pass through each element only once. That results in O(N) complexity for this method.
Merge example
Assume we have an array consisting of two sorted subarrays array[0:3] and array[4:7]. Firstly, we initialise two pointers i and j that point to the least elements. In our case, they point to elements with indices 0 and 4 respectively.
Merge example
Iteration 1. We compare array[0] = 2 and array[4] = 3. Since 2 ≤ 3, then we save 2 as the first element of the new array and increment i by 1, so it points to the next increasing element which is 9.
Iteration 2. We compare array[1] = 9 and array[4] = 3. Since 9 > 3, then we save 3 as a second element of the new array and increment j by 1, so it points to the next increasing element which is 7.
Iteration 3. array[1] = 9 > array[5] = 7. We save 7 as the next element. j is incremented by 1.
…
Iteration 7. j points to the end of the right array. It means that all elements starting from index i in the left array are greater than any element in the right one. Since that, we copy all the rest elements (16 and 20) from the right array to the new array.
After this procedure is completed, all the sorted values of the new array then are copied into the original array.
Sort function
The sort function recursively calls itself to sort separately its left and right halves. When both of the halves are sorted, they are joined together after the merge_sort() function call.
https://medium.com/media/a98a70a4673291118fa077ed9e0a9c51/href
To make a function interface more convenient for a client, the first function call of merge_sort() can be wrapped into another function. This way the client does not have to pass left and right arguments of the function which successfully follows the encapsulation design principle.
https://medium.com/media/a873baf784af86778ec6062d608cd92d/href
Example
The hierarchy call of the merge sort for an array with 8 elements is shown in the figure below. Firstly, we divide the array into two equal parts of length 4. For each of the halves, we call the merge sort again. It results in arrays of size 2. By the way, there is no need to divide these arrays again because any array consisting of a single element is always sorted.
Merge sort example
We continue the algorithm by sorting arrays of size 2. Once two sequential arrays of size 2 are sorted, they are merged into a new sorted one of length 4. This process lasts for all arrays of size 2. Once two sequential arrays of size 4 are sorted, they are analogously merged into a new sorted array of length 8.
We terminate the procedure when all of the elements are sorted.
Complexity
To evaluate complexity, we first need to understand the structure of recursion calls. Suppose we have an array of size N that needs to be sorted.
We divide this array into two halves of size N / 2. Both of the halves are then divided into two smaller halves of size N / 4. As a result, we end up with 4 arrays of size N / 4. Similarly, on the next level, we end up with 8 arrays of size N / 8. We continue this procedure until we reach N arrays of size 1. This process is illustrated in the following figure.
Tree complexity
For each of the illustrated arrays, we need to call a merge procedure which requires O(K) time where K is the array length. Let us calculate the total time complexity for each level of arrays in the figure above.
For the first level, we have a single array of size N which requires O(N) processing time.For the second level, we have 2 arrays of size N / 2. Each of those arrays requires O(N / 2) time. The total time is 2 * O(N / 2) = O(N).For the third level, we have 4 arrays of size N / 4. Each of those arrays requires O(N / 4) time. The total time is 4 * O(N / 2) = O(N).The last level contains N arrays of size 1. Each of those arrays requires O(1) time. The total time is N * O(1) = O(N).
Following that logic, it is clear that each level requires O(N) processing time. Since at each step each array is divided into two halves, it is easy to notice that there are O(logN) levels. This results in the final time complexity which is O(N) * O(logN) = O(N * logN).
Advantages
Efficiency. Merge sort has an overall complexity O(N * logN). This is the best possible complexity among all sorting algorithms that are based on comparing array elements. However, during the merge procedure elements need to be temporarily sorted in another array. The size of this array is the same as the length of both subarrays which results in O(N) memory space.Simplicity. Compared to other efficient sorting algorithms, merge sort can be easily interpreted and implemented.Stable sort. If an initial array has equal elements, then the merge sort algorithm will not change their relative positions. This is an important property that is used by more complex algorithms that have a sorting part in their implementation.
Conclusion
We have looked at one of the most popular sorting algorithms which helped us to understand the principle of the divide-and-conquer paradigm. Apart from that, merge sort elegantly combines its simplicity with efficient complexity and is used in a large number of applications.
All images unless otherwise noted are by the author
Sorting Algorithms, Part 1: Merge Sort was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.