kth quantiles
n-elements set
Cormen
algorithm
data analysis

Finding the kth quantiles of an n-elements set. From cormen

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

In the realm of statistics, dividing a data set into equal parts gives insight into the distribution of data. Typically, you might hear about percentiles or quartiles, which divide data into 100 or 4 parts, respectively. However, this technique can be generalized into finding the kth quantiles of a set containing n elements. The concept is rooted deeply in algorithms, and a comprehensive understanding can be gathered from the works of Thomas Cormen among others.

Basics of Quantiles

A quantile of a data set is essentially a point or a set of points that divides the data into intervals with equal probability. For instance:

  • Median (2-quantile): Divides the set into two equal halves.
  • Quartiles (4-quantiles): Divide the set into four equal segments.
  • Percentiles (100-quantiles): Divide the set into one hundred equal segments.

The kth quantile is thus the general form, where k can be any integer that makes sense for the specific data analysis.

Finding Kth Quantiles

Problem Definition

Given a set S of n elements and a positive integer k , the task is to partition S into k segments of nearly equal numbers of elements. More formally, you want to find k-1 order statistics in S such that these statistics divide S into k subsets containing either floor(n/k) or ceil(n/k) elements.

Approach to Finding Kth Quantiles

The approach can be visualized and executed through a combination of sorting and selecting. Here's a step-by-step outline:

  1. Sort the Array: Sort the array S of n elements. Let's assume we now have S sorted.
  2. Determine Segment Size: Calculate the size of each segment, given by m = n/k . In most cases, this will give a non-integer value, meaning that some segments may have an extra element.
  3. Select Kth Quantile Points:
    • Iterate over the sorted array.
    • Select the (i*m) -th element as the (i) th quantile, where i1,2,...,k1i \in {1, 2, ..., k-1} .
  4. Handle Uneven Segments: If n % k != 0 , some segments will contain one more element than others. It's important to account for this when planning the segment sizes.

Implementation Example

Let's implement an example in Python to demonstrate finding the quartiles (k=4) of a given set:

  • Sorting: O(nlogn)O(n \log n), where n is the number of elements.
  • Selection: O(1)O(1) for each quantile, because the sorted array allows direct indexing. This results in O(k)O(k) for finding all k-1 quantiles.

Course illustration
Course illustration

All Rights Reserved.