Sunday, November 1, 2009

[SPOJ] MKPAIRS

This task is an application of the y-intercept optimization problem in the previous post.

From an initial $O(n^3)$ DP algorithm, we could speed up to $O(n^2logn)$, which still got TLE on SPOJ.
Since for each hull, all the query slopes are decreasing, we could speed up the algorithm further to $O(n^2)$.

See the USACO analysis for more details.

The y-intercept optimization problem

Given a set of pairs $(x_1, y_1), ..., (x_k, y_k)$, and a number $a$, how do we find: $min \{ax_i+y_i\} $

This problem can be viewed geometrically as the following.
Consider $(x_1, y_1), ..., (x_k, y_k)$ as points on the plane.

For a given $a$, the line equation of the line passing through $(x_i, y_i)$ with slope $-a$ is:
$y = a(x_i-x) + y_i$
When $x=0$, $y=ax_i + y_i$, thus $ax_i+y_i$ is the y-intercept of the line passing through $(x_i, y_i)$ with slope $-a$

Our problem becomes to find the minimum y-intercept of the lines passing through $(x_i, y_i)$ with slope $-a$

It can be proved (and intuitively) that we only need to keep the convex hulls of $(x_1, y_1), ..., (x_k, y_k)$

The problem of maximizing the y-intercept is similar.

For the minimization problem, we only need to keep the lower hull. It is a set of connected segments with increasing slopes:



If we need to query the minimum intercept of lines with slope $a$, the optimum point $(x_i, y_i)$ is the point such that $a$ lies between the slope of the segment $(i-1), i$ and the segment $i, (i+1)$ (see figure above).

In other words, to query the minimum intercept, we can use binary search on the sequence of slopes.

If we know that the slope being queried is always increasing, we can just go through the set of segments linearly.

To keep track of the lower hull, if the given points are in increasing order of x, we could use a stack and obtain linear time amortized complexity:
for i ← 1 to k do
while (size(s)>2 and (s[top-1], s[top], p[i]) is a right turn
pop(s)
s.push(p[i])

A direct application of this problem is the task "Line Country" from NUS Selection Test #2, 2009. Since the slope being queried is not always increasing, we need to use binary search to obtain a O(nlogn) algorithm.
Implementation

Saturday, October 31, 2009

[SPOJ] PALDR (Solved)

PALDR Solved
Same algorithm but replaced palindromic checking by hashing (use int64 hash range):

for (int j=i; j < n; ++j) {
h=31*h+s[j];
hR=hR+p*s[j];
p*=31;

if ((j-i)%2==1 && h==hR) {
cont=true;
i=j+1;
break;
}
}

[SPOJ] MYSTIC

Let N be the number of types.

We have S trials and N types. N must occur $u \geq need[N]$ times in the sequence of trials. Removing these occurrences of N, we obtain a subproblem with S-u trials and N-1 types. This lead us to the following DP solution:

Let C(S,T) be the number of outcomes to satisfy the first S types when we have T trials. We have:
$C(S,T) = \sum_{u \geq need[S]}{_{u}^{T}{C}\times C(S-1,T-u)}$

We would like to compute the probability, not the cardinality C(S,T).
Let $F(S,T) = \frac{C(S,T)}{N^T}$
We have:
$F(S,T) = \frac{\sum_{u \geq need[S]}{_{u}^{T}C\times F(S-1,T-u)\times N^{T-u}}}{N^T}$
$= \sum_{u \geq need[S]}{\frac{_{u}^{T}C\times F(S-1,T-u)}{N^u}}$

F(N, G) is the result probability.

Note that F(S, T) is not a probability value. The correct probability value should be $\frac{C(S, T)}{S^T}$, not $\frac{C(S, T)}{N^T}$. To define F(S,T) is a trick to get the result probability faster.

Probability and cardinality could be used interchangeably.

When constructing a DP solution, use cardinality to be clearer. After that, divide the cardinality to the total outcome to get the probability function.

Of course, we could also define $F(S,T) = \frac{C(S,T)}{S^T}$, which is the correct probability value. A little bit more machine computations is required.

[SPOJ] VPALIN

Let $P=A^m=\underset{m \: times}{\underbrace{AA..A}}$
It's easy to prove that: if P is a palindrome then A is also a palindrome

Let Base(P) = the shortest string A such that $P=A^m$
Let P, Q be two palindromes. We can prove that:
PQ is a palindrome iff Base(P) = Base(Q)

Therefore, the problem is reduced to finding the period of a string, which can be done in O(n) using KMP's Next table.

To efficiently count the number of appearances of a base, we can use a hash function from a string into int64 range.
int64 hash(char *s, int n) {
int64 h=0;
FOR(i,n) h=31*h+s[i];
return h;
}

[SPOJ] Bookcase

Observe that: one of the maximum height is the largest height among all books

There are 3 book cases, let H1 H2 H3 be the maximum height of a book on case 1, 2, 3
WLOG, we always put the book with largest height on case 3
Let W1 be the width of case 1, W2 be the width of case 2
=> the width of case 3 = totalWidth - W1 - W2

Thus, if for every possible values (W1,W2), we could compute min{H1 + H2}, we are done.
This can be computed by DP:
let f[W1, W2] be the minimum value of H1+H2, among all possible arrangements of books such that case 1 has width W1 and case 2 has width W2 and the book width largest height is on case 3.

To see this, consider the 1D version: find a subset of book such that the sum of width is W and the maximum height is minimized. Sort the book in decreasing height. Let F[i][w] = minimum {max height | sum of width = w, using first i books}

If width[i] <= w:
F[i][w]=F[i-1][w] + (width[i] < w?F[i-1][w-width[i]]:(f[i-1][0]+height[i]))

The 2D case is similar: F[i][w1][w2]

Wednesday, October 28, 2009

[SPOJ] CSUBSEQS

Task: Given K sequences, determine the number of distinct common subsequences of them.

Suppose $\Sigma$ is the alphabet. In this problem, K=4 and $|\Sigma|=26$.

I have two algorithms for this problem:

Algorithm 1: $O(kn^k|\Sigma|)$:

Idea: partition the set of common subsequences by the ending character. If C is a common subsequence, it must either end by 'a', 'b', ..., or 'z'.

Let $S_{i_1,i_2,...,i_k}$ be the set of distinct common subsequences of $s_1[1..i_1], s_2[1..i_2], ..., s_k[1..i_k]$

We have:
$S_{i_1,i_2,...,i_k} = \bigcup_{c \in \Sigma}{c \cup ({S_{p_1-1,p_2-1,...,p_k-1}} \times \{ c \}})$
in which $p_i$ is the index of the last appearance of c in $s_i$.

In other words, the common subsequences ended by c are either:
  • The one-element subsequence "c"

  • A concatenation of a common subsequence of $S_{p_1-1,p_2-1,...,p_k-1}$ and c

Note that the character c must have been appeared in all sequences so far.

Speed up: We could use memoization instead of a bottom-up DP approach since not every entry in the table needs to be computed. Which entries are computed depends on the input string. My submission using memoization and map to store states got the top speed on SPOJ.

Algorithm 2: $O(2^{k+1}n^k)$
Let $S_{i_1,i_2,...,i_k}$ be the set of distinct common subsequences of $s_1[1..i_1], s_2[1..i_2], ..., s_k[1..i_k]$
Partition the common subsequences into to two set:
  • At least there is a string $s_t$ in which the common subsequence does not end at $s_t[i_t]$: we could compute the number of elements of this set by using the Inclusion Exclusion formula.

  • The common subsequence ends at $s_1[i_1], s_2[i_2], ..., s_k[i_k]$: we compute how many new subsequences in this set. This could also be computed using the Inclusion Exclusion formula by looking at the set of indexes $p_1, p_2, ... ,p_k$ in which $p_t$ is the index of the last appearance before $i_t$ of the character $s_t[i_t]$


My solution for algorithm 2 got TLE.

Note
To deal with distinct subsequences, observe that if a subsequence in the set end with a character c, it could always end with the rightmost c. An algorithm often processes indexes array such as p[i] = max{j, j < i AND s[j] = s[j]} (last appearance of the same element, or p[i][c] = max{j, j<=i AND s[j]=c} (last appearance of c)