Wednesday, October 28, 2009

[SPOJ] CSUBSEQS

Task: Given K sequences, determine the number of distinct common subsequences of them.

Suppose $\Sigma$ is the alphabet. In this problem, K=4 and $|\Sigma|=26$.

I have two algorithms for this problem:

Algorithm 1: $O(kn^k|\Sigma|)$:

Idea: partition the set of common subsequences by the ending character. If C is a common subsequence, it must either end by 'a', 'b', ..., or 'z'.

Let $S_{i_1,i_2,...,i_k}$ be the set of distinct common subsequences of $s_1[1..i_1], s_2[1..i_2], ..., s_k[1..i_k]$

We have:
$S_{i_1,i_2,...,i_k} = \bigcup_{c \in \Sigma}{c \cup ({S_{p_1-1,p_2-1,...,p_k-1}} \times \{ c \}})$
in which $p_i$ is the index of the last appearance of c in $s_i$.

In other words, the common subsequences ended by c are either:
  • The one-element subsequence "c"

  • A concatenation of a common subsequence of $S_{p_1-1,p_2-1,...,p_k-1}$ and c

Note that the character c must have been appeared in all sequences so far.

Speed up: We could use memoization instead of a bottom-up DP approach since not every entry in the table needs to be computed. Which entries are computed depends on the input string. My submission using memoization and map to store states got the top speed on SPOJ.

Algorithm 2: $O(2^{k+1}n^k)$
Let $S_{i_1,i_2,...,i_k}$ be the set of distinct common subsequences of $s_1[1..i_1], s_2[1..i_2], ..., s_k[1..i_k]$
Partition the common subsequences into to two set:
  • At least there is a string $s_t$ in which the common subsequence does not end at $s_t[i_t]$: we could compute the number of elements of this set by using the Inclusion Exclusion formula.

  • The common subsequence ends at $s_1[i_1], s_2[i_2], ..., s_k[i_k]$: we compute how many new subsequences in this set. This could also be computed using the Inclusion Exclusion formula by looking at the set of indexes $p_1, p_2, ... ,p_k$ in which $p_t$ is the index of the last appearance before $i_t$ of the character $s_t[i_t]$


My solution for algorithm 2 got TLE.

Note
To deal with distinct subsequences, observe that if a subsequence in the set end with a character c, it could always end with the rightmost c. An algorithm often processes indexes array such as p[i] = max{j, j < i AND s[j] = s[j]} (last appearance of the same element, or p[i][c] = max{j, j<=i AND s[j]=c} (last appearance of c)

1 comment: