subsequence
algorithms
dynamic programming
computational theory
string manipulation

Common subsequence of given length

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In computer science and mathematics, the problem of finding a common subsequence of a given length among two or more sequences is a fascinating area with numerous applications, ranging from bioinformatics and text comparison to data compression and telecommunication. A subsequence is a sequence derived from another sequence by deleting some elements without changing the order of the remaining elements.

A common subsequence of given sequences is a subsequence that appears in the same order in all the sequences being considered, but not necessarily contiguously. Finding such a subsequence of a specified length is a more challenging task than the simpler problem of finding the longest common subsequence (LCS).

Problem Statement

Given two sequences, AA and BB, and a length, kk, the problem is to determine a subsequence of length kk that is common to both AA and BB. This involves checking whether such a subsequence exists and finding at least one example if it does.

Example:

Consider strings $A = \text\{"ABCBDAB"\}$ and $B = \text\{"BDCAB"\}$. A common subsequence of length 3 is "BCA", where characters appear in the same order but may not be contiguous.

Technical Explanation

Dynamic Programming Approach

To solve this problem, dynamic programming can be employed efficiently. Let's define a 3D array dp[i][j][l]dp[i][j][l] where:

ii is the index in string AA, • jj is the index in string BB, • ll is the length of the desired subsequence.

The array dp[i][j][l]dp[i][j][l] will hold a boolean true if there is a common subsequence of length ll using the first ii characters of AA and the first jj characters of BB, otherwise false.

Initialization:

  1. dp[i][j][0]=Truedp[i][j][0] = \text{True} for all ii and jj, because a subsequence of length 0 is trivially common.
  2. $dp[0][j][l] = \text\{False\}$ and $dp[i][0][l] = \text\{False\}$ for $l > 0$, as there is no non-zero subsequence in a zero-length string.

Transition:

For 1ilength of A1 \le i \le \text{length of A}, 1jlength of B1 \le j \le \text{length of B}, and 1lk1 \le l \le k:

• If A[i1]==B[j1]A[i-1] == B[j-1]: dp[i][j][l]=dp[i1][j1][l1]dp[i1][j][l]dp[i][j1][l]dp[i][j][l] = dp[i-1][j-1][l-1] \lor dp[i-1][j][l] \lor dp[i][j-1][l]

• Otherwise: dp[i][j][l]=dp[i1][j][l]dp[i][j1][l]dp[i][j][l] = dp[i-1][j][l] \lor dp[i][j-1][l]

You should finally check the value of dp[length of A][length of B][k]dp[\text{length of A}][\text{length of B}][k] to determine if a common subsequence of length kk exists.

Complexity

Time Complexity: O(n×m×k)O(n \times m \times k) where nn and mm are the lengths of sequences AA and BB, respectively. • Space Complexity: Also O(n×m×k)O(n \times m \times k) for storing the dynamic programming table.

Example Walkthrough

Consider the sequences $A = \text\{"AGGTAB"\}$ and $B = \text\{"GXTXAYB"\}$, trying to find a common subsequence of length 4.

Step-by-Step:

  1. Initialization:

Genome Sequencing: Comparing DNA sequences to find common mutations or evolutionary paths. • Spell Checkers: Detecting differences between two texts for correction suggestions. • Data Compression: Identifying redundant sequences in data for compression.


Course illustration
Course illustration

All Rights Reserved.