Kasai Algorithm for Constructing LCP-Array Practical Example
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Kasai's algorithm builds the LCP array from a string and its suffix array in linear time. The LCP array stores the longest common prefix length between adjacent suffixes in sorted order, which makes it useful for repeated-substring queries, suffix-array search, and many other string algorithms. The practical value of Kasai's method is that it avoids recomputing prefix matches from scratch for every adjacent suffix pair.
What the LCP Array Represents
Suppose the string is banana. Its sorted suffixes are:
- '
aat index5' - '
anaat index3' - '
ananaat index1' - '
bananaat index0' - '
naat index4' - '
nanaat index2'
So the suffix array is [5, 3, 1, 0, 4, 2].
The LCP array records the common prefix lengths between neighboring suffixes in that order:
- LCP of
aandanais1 - LCP of
anaandananais3 - LCP of
ananaandbananais0 - LCP of
bananaandnais0 - LCP of
naandnanais2
A common convention is [0, 1, 3, 0, 0, 2], where the first slot is 0 because there is no previous suffix.
The Key Insight Behind Kasai's Algorithm
If two suffixes share a common prefix of length h, then after moving one position to the right in the original string, the next relevant comparison can start with at least h - 1 matched characters.
That is the whole trick. Instead of rechecking from the first character every time, the algorithm carries forward useful information from the previous comparison.
To do that efficiently, it builds a rank array where rank[i] tells you where the suffix starting at index i appears in the suffix array.
A Runnable Python Implementation
This prints:
Step Through the banana Example
The most educational part is how h evolves.
When the algorithm compares suffixes starting at 3 and 5, it sees ana and a, so the common prefix length is 1. Later, when it moves to the suffix starting at 1, it already knows the next useful comparison can begin with h = 0 or h = 1 - 1, not from some arbitrary guess.
The large gain comes from the fact that each character match contributes to h only a limited number of times across the entire run. That is why the total complexity stays linear.
Why the Time Complexity Is O(n)
At first glance, the nested while loop looks suspicious. It seems like it could compare many characters for many suffixes.
The reason it is still O(n) is that h only increases when new character matches are discovered, and after each outer iteration it decreases by at most 1. Across the whole algorithm, the total number of character comparisons is bounded linearly by the string length.
This is the same style of amortized argument that appears in many efficient string algorithms.
Where LCP Arrays Are Used
Once you have the suffix array and LCP array together, you can:
- find repeated substrings efficiently
- speed up pattern matching over suffix arrays
- compute the longest repeated substring
- support some compressed indexing techniques
Kasai's algorithm is important because it makes the LCP array cheap enough to be part of normal suffix-array workflows.
Common Pitfalls
The biggest mistake is mixing two different LCP conventions and then thinking the algorithm is wrong. Some implementations store the LCP against the previous suffix, while others describe it against the next suffix. The indexing convention must stay consistent.
Another mistake is using rank[i] + 1 where the implementation expects comparison with the previous suffix at rank[i] - 1.
A third issue is assuming the while loop makes the algorithm quadratic. The amortized analysis is what keeps the total work linear.
Summary
- Kasai's algorithm builds the LCP array from a string and suffix array in
O(n)time - The
rankarray maps original suffix positions into suffix-array order - The reused prefix length
his the key optimization - On
banana, the LCP array is[0, 1, 3, 0, 0, 2] - LCP arrays are useful for repeated-substring queries, suffix-array search, and other string-processing tasks

