Kasai Algorithm
LCP Array
Suffix Array
Algorithm Example
String Processing

Kasai Algorithm for Constructing LCP-Array Practical Example

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Kasai's algorithm builds the LCP array from a string and its suffix array in linear time. The LCP array stores the longest common prefix length between adjacent suffixes in sorted order, which makes it useful for repeated-substring queries, suffix-array search, and many other string algorithms. The practical value of Kasai's method is that it avoids recomputing prefix matches from scratch for every adjacent suffix pair.

What the LCP Array Represents

Suppose the string is banana. Its sorted suffixes are:

  • 'a at index 5'
  • 'ana at index 3'
  • 'anana at index 1'
  • 'banana at index 0'
  • 'na at index 4'
  • 'nana at index 2'

So the suffix array is [5, 3, 1, 0, 4, 2].

The LCP array records the common prefix lengths between neighboring suffixes in that order:

  • LCP of a and ana is 1
  • LCP of ana and anana is 3
  • LCP of anana and banana is 0
  • LCP of banana and na is 0
  • LCP of na and nana is 2

A common convention is [0, 1, 3, 0, 0, 2], where the first slot is 0 because there is no previous suffix.

The Key Insight Behind Kasai's Algorithm

If two suffixes share a common prefix of length h, then after moving one position to the right in the original string, the next relevant comparison can start with at least h - 1 matched characters.

That is the whole trick. Instead of rechecking from the first character every time, the algorithm carries forward useful information from the previous comparison.

To do that efficiently, it builds a rank array where rank[i] tells you where the suffix starting at index i appears in the suffix array.

A Runnable Python Implementation

python
1def kasai_lcp(s, sa):
2    n = len(s)
3    rank = [0] * n
4    lcp = [0] * n
5
6    for i, suffix_start in enumerate(sa):
7        rank[suffix_start] = i
8
9    h = 0
10    for i in range(n):
11        if rank[i] == 0:
12            h = 0
13            continue
14
15        j = sa[rank[i] - 1]
16
17        while i + h < n and j + h < n and s[i + h] == s[j + h]:
18            h += 1
19
20        lcp[rank[i]] = h
21
22        if h > 0:
23            h -= 1
24
25    return lcp
26
27
28s = "banana"
29sa = [5, 3, 1, 0, 4, 2]
30print(kasai_lcp(s, sa))

This prints:

python
[0, 1, 3, 0, 0, 2]

Step Through the banana Example

The most educational part is how h evolves.

When the algorithm compares suffixes starting at 3 and 5, it sees ana and a, so the common prefix length is 1. Later, when it moves to the suffix starting at 1, it already knows the next useful comparison can begin with h = 0 or h = 1 - 1, not from some arbitrary guess.

The large gain comes from the fact that each character match contributes to h only a limited number of times across the entire run. That is why the total complexity stays linear.

Why the Time Complexity Is O(n)

At first glance, the nested while loop looks suspicious. It seems like it could compare many characters for many suffixes.

The reason it is still O(n) is that h only increases when new character matches are discovered, and after each outer iteration it decreases by at most 1. Across the whole algorithm, the total number of character comparisons is bounded linearly by the string length.

This is the same style of amortized argument that appears in many efficient string algorithms.

Where LCP Arrays Are Used

Once you have the suffix array and LCP array together, you can:

  • find repeated substrings efficiently
  • speed up pattern matching over suffix arrays
  • compute the longest repeated substring
  • support some compressed indexing techniques

Kasai's algorithm is important because it makes the LCP array cheap enough to be part of normal suffix-array workflows.

Common Pitfalls

The biggest mistake is mixing two different LCP conventions and then thinking the algorithm is wrong. Some implementations store the LCP against the previous suffix, while others describe it against the next suffix. The indexing convention must stay consistent.

Another mistake is using rank[i] + 1 where the implementation expects comparison with the previous suffix at rank[i] - 1.

A third issue is assuming the while loop makes the algorithm quadratic. The amortized analysis is what keeps the total work linear.

Summary

  • Kasai's algorithm builds the LCP array from a string and suffix array in O(n) time
  • The rank array maps original suffix positions into suffix-array order
  • The reused prefix length h is the key optimization
  • On banana, the LCP array is [0, 1, 3, 0, 0, 2]
  • LCP arrays are useful for repeated-substring queries, suffix-array search, and other string-processing tasks

Course illustration
Course illustration