Counting inversions in an array

Inversion counting

array algorithms

data structures

sorting techniques

computational complexity

Counting inversions in an array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Counting inversions in an array is a classic problem in computer science with applications in various fields such as sorting theory, information retrieval, and computational biology. An inversion is a pair of indices (i, j) in an array such that i < j and the array element at i is greater than the array element at j. Counting these inversions helps understand how far an array is from being sorted.

Problem Definition and Importance

Given an array of n integers, the goal is to determine the number of inversions. This measure indicates the degree of disorder within the array. An inversion count of 0 means the array is sorted, while a maximum inversion count signifies a completely reversed array.

Inversions are crucial for:

Sorting Algorithms: Understanding inversions can optimize or analyze sorting algorithms.
Genome Analysis: Inversions correspond to genome mutations where segments are reversed.
Rank Correlation: It is used to calculate the Kendall tau distance between two rankings.

Naive Approach

The simplest way to count inversions is by using a brute-force approach. This involves checking every pair of elements in the array.

Algorithm

Initialize inv_count = 0.
Iterate through the array with two loops, i from 0 to n-2 and j from i+1 to n-1.
For each pair (i, j), if array[i] > array[j], increment inv_count.
Return inv_count.

Complexity Analysis

This approach has a time complexity of $O(n^2)$ , which is inefficient for large arrays due to its quadratic nature.

Example

Consider the array [2, 4, 1, 3, 5].

Inversions are (2, 1), (4, 1), and (4, 3).
Total inversions = 3.

Efficient Approach: Merge Sort-Based Method

A more efficient approach utilizes a modified version of the merge sort algorithm. The key idea is to count inversions during the merge process.

Algorithm

Base Case

If the array has one or zero elements, return 0 inversions.

Recursive Case

Divide the array into two halves.
Recursively count inversions in the left half, right half, and during the merge process.
During merging, count inversions where the left element is greater than the right element, as these represent valid inversions across the divided parts.

Merge Function

While merging two sorted halves, for any element from the left half that is greater than an element from the right half, every remaining element in the left half will also be greater.

Pseudocode for Merge and Count

1function merge_and_count(arr, temp_arr, left, mid, right):
2    i = left    // Starting index for left subarray
3    j = mid + 1 // Starting index for right subarray
4    k = left    // Starting index to be merged
5    inv_count = 0
6
7    while i <= mid and j <= right:
8        if arr[i] <= arr[j]:
9            temp_arr[k] = arr[i]
10            i += 1
11        else:
12            temp_arr[k] = arr[j]
13            inv_count += (mid-i + 1)
14            j += 1
15        k += 1
16
17    // Copy remaining elements of the left subarray
18    while i <= mid:
19        temp_arr[k] = arr[i]
20        i += 1
21        k += 1
22
23    // Copy remaining elements of the right subarray
24    while j <= right:
25        temp_arr[k] = arr[j]
26        j += 1
27        k += 1
28
29    // Copy sorted subarray into original array
30    for i from left to right:
31        arr[i] = temp_arr[i]
32
33    return inv_count

Complexity Analysis

The modified merge sort has a time complexity of $O(n \log n)$ , making it suitable for large datasets.

Example Implementation

python

1def merge_sort_and_count(arr, temp_arr, left, right):
2    inv_count = 0
3    if left < right:
4        mid = (left + right) // 2
5
6        inv_count += merge_sort_and_count(arr, temp_arr, left, mid)
7        inv_count += merge_sort_and_count(arr, temp_arr, mid + 1, right)
8        inv_count += merge_and_count(arr, temp_arr, left, mid, right)
9
10    return inv_count
11
12def count_inversions(arr):
13    temp_arr = [0] * len(arr)
14    return merge_sort_and_count(arr, temp_arr, 0, len(arr) - 1)
15
16# Example usage
17arr = [2, 4, 1, 3, 5]
18print(f"Number of inversions are {count_inversions(arr)}")

Key Points Summary

Aspect	Details
Definition	An inversion is a pair `(i, j)` where `i < j` and `array[i] > array[j]`.
Naive Approach	Time Complexity: $O(n^2)$ Inefficient for large arrays.
Merge Sort Approach	Time Complexity: $O(n \log n)$ Efficient for large datasets.
Use Cases	Sorting optimization, genome analysis, rank correlation.
Example	Array `[2, 4, 1, 3, 5]` has 3 inversions: `(2, 1)`, `(4, 1)`, `(4, 3)`.

Conclusion

The problem of counting inversions in an array exemplifies how algorithmic efficiency significantly impacts computational tasks. While a naive approach is easy to understand, the merge sort-based method provides a much-needed performance boost for large arrays. This balance between understanding and efficiency underscores much of algorithm design and analysis within computer science.