9.   Words and Maps

This chapter covers global properties of words (N-letter strings from an M-letter alphabet), which are well-studied in classical combinatorics (because they model sequences of independent Bernoulli trials) and in classical applied algorithmics (because they model input sequences for hashing algorithms). The chapter also covers random maps (N-letter words from an N-letter alphabet) and discusses relationships with trees and permutations.

9.1 Hashing with Separate Chaining

9.2 Basic Properties of Words

9.3 Birthday Paradox and Coupon Collector Problem

9.4 Occupancy Restrictions and Extremal Parameters

9.5 Occupancy Distributions

9.6 Open Addressing Hashing

9.7 Maps

9.8 Integer Factorization and Maps

Selected Exercises


For $M=365$, how many people are needed to be 99% sure that two have the same birthday?


Give a plot (for $M=365$) of the probability that three people have the same birthday.


For $M=365$, how many people are needed to be 50% sure that three have the same birthday? Four?


What is the probability that one urn will get all the balls when 100 balls are randomly distributed among 100 urns?


What is the probability that each urn will get one ball when 100 balls are randomly distributed among 100 urns?


Find $[z^n]e^{\alpha C(z)}$ where $C(z)$ is the Cayley function (see the discussion at the end of Section 6.14 and in Section 9.7 in this chapter).5


(``Abel's binomial theorem.'') Use the result of the previous exercise and the identity $e^{(\alpha+\beta)C(z)}=e^{\alpha C(z)}e^{\beta C(z)}$ to prove that $$(\alpha+\beta)(n+\alpha+\beta)^{n-1} =\alpha\beta\sum_k{n\choose k}(k+\alpha)^{k-1}(n-k+\beta)^{n-k-1}.$$


A mapping with no repeated integers is a permutation. Give an efficient algorithm for determining whether a mapping is a tree.


Describe the graph structure of partial mappings, where the image of a point may be undefined. Set up the corresponding EGF equations and check that the number of partial mappings of size $N$ is $(N+1)^N$.


Generate 100 random maps of size 100, 1000, and 10000 and empirically verify that about 48% of the nodes are in the largest tree, that about 76% of the nodes are in the largest cycle, and that, in a random map of size $N$, the number of nodes in the longest tail, cycle, and rho-path are about $1.74\sqrt{N}$, $.78\sqrt{N}$, and $2.41\sqrt{N}$, respectively.


Use Floyd's method to test the random number generator on your machine for short cycles.


Show that the probability that a random mapping of size N has no singleton cycles is $\sim 1/e$, the same as for permutations (!).

Review Questions


Give the EGF for random mappings with no singleton cycles. Express your answer as a function of the Cayley function $C(z) = e^{C(z)}$.


This problem involves enumerating the mappings where every character appears exactly twice or not at all. Such mappings must be of even length. There are 2 such mappings of length 2 (11 and 22) and 36 of them of length 4. By a direct combinatorial argument, it is not difficult to derive the explicit form $$C_{2n} = {2n\choose n}{(2n)!\over 2^n}.$$ Give an explicit formula for the EGF $$C(z) = \sum_{n\ge0}{C_{2n}\over(2n)!}z^{2n}$$ and an asymptotic estimate (~-approximation) of $C_{2n}$. Extra credit: Give a full derivation via analytic combinatorics.


Give a ~approximation for the proportion of mappings of elements that have more than one connected component.