8.   Strings


This chapter studies basic combinatorial properties of strings, sequences of characters or letters drawn from a fixed alphabet, and introduces algorithms that process strings ranging from fundamental methods at the heart of the theory of computation to practical text-processing methods with a host of important applications.


8.1 String Searching


8.2 Combinatorial properties of bitstrings


8.3 Regular expressions


8.4 Finite-State Automata and the Knuth-Morris-Pratt Algorithm


8.5 Context-Free Grammars


8.6 Tries


8.7 Trie Algorithms


8.8 Combinatorial Properties of Tries


8.9 Larger alphabets

Selected Exercises

8.1

Give two recurrences satisfied by $[z^N]B_P(z)$.


8.2

How long a string of random bits should be taken to be 99% sure that there are at least three consecutive 0s?


8.3

How long a string of random bits should be taken to be 50% sure that there are at least 32 consecutive 0s?


8.6

By considering bitstrings with no runs of two consecutive 0s, evaluate the following sum involving Fibonacci numbers: $\sum_{j\ge0} F_j/2^j$.


8.14

Suppose that a monkey types randomly at a 32-key keyboard. What is the expected number of characters typed before the monkey hits upon the phrase THE QUICK BROWN FOX JUMPED OVER THE LAZY DOG?


8.15

Suppose that a monkey types randomly at a 32-key keyboard. What is the expected number of characters typed before the monkey hits upon the phrase TO BE OR NOT TO BE?


8.22

Suppose that a monkey types randomly at a 2-key keyboard. What is the expected number of bits typed before the monkey hits upon a string of $2k$ alternating 0s and 1s?


8.40

Suppose that a monkey types randomly at a 32-key keyboard that has 26 letters A through Z, the symbols +, *, (, and ), a space key, and a period. What is the expected number of characters typed before the monkey hits upon a legal regular expression? Assume that spaces can appear anywhere in a regular expression and that a legal regular expression must be enclosed in parentheses and have exactly one period, at the end.


8.42

There are ${8\choose5}=56$ different sets of five three-bit bitstrings. Which trie is associated with the most of these sets? The least?


8.44

How many different tries are there with $N$ external nodes?


8.45

What proportion of the external nodes are void in a "random" trie (assuming each different trie structure to be equally likely to occur)?


8.52

Show that $A_N/N$ is equal to $1/\ln 2$ plus a fluctuating term.


8.53

Write a program to compute $A_N$ to within $10^{-9}$ for $N<10^6$ and explore the oscillatory nature of $A_N/N$.


8.57

Solve the recurrence for $p_N$ given in the proof of Theorem 8.9, to within the oscillating term. $$p_N= {1\over 2^{N}} \sum_k{N\choose k}p_k\quad \hbox{for}\ N>1\quad \hbox{with}\ p_0=0\ \hbox{and}\ p_1 = 1$$


Review Questions


Q8.1

Rank these patterns by expected wait time in a random bit string (lowest to highest). (a) 00000 (b) 00001 (c) 01000 (d) 01010 (e) 10101


Q8.2

Give the OGF for the number of bitstrings not containing the pattern 01010.


Q8.3

Give the OGF for the number of bitstrings not containing the pattern 1011.


Q8.4

(J. Chen) Use BGFs to count the expected number of occurrences of 00 in a random binary string of length $N$. (By convention, a substring of $t$ consecutive 0s has $t-1$ occurrences of 00.)