5. Analytic Combinatorics
This chapter introduces analytic combinatorics, a modern approach to
the study of combinatorial structures of the sort that we encounter
frequently in the analysis of algorithms. The approach is predicated
on the idea that combinatorial structures are typically defined by
simple formal rules that are the key to learning their properties.
One eventual outgrowth of this observation is that a relatively small set of
transfer theorems ultimately yields accurate approximations of the
quantities that we seek. Figure 5.1 gives an general overview of the process.
Generating functions are the central objects of study in analytic
combinatorics. In the first place, we directly translate formal
definitions of combinatorial objects
into definitions of generating functions that enumerate objects or
describe their properties. In the second place, we use classical
mathematical analysis to extract estimates of
generating function coefficients.
5.1 Formal Basis
5.2 Symbolic Method for Unlabelled Classes
5.3 Symbolic Method for Labelled Classes
5.4 Symbolic Method for Parameters
Theorem 5.3 (Symbolic method for unlabelled class OBGFs)
Let $\cal A$ and $\cal B$ be unlabelled classes of combinatorial objects.
If $A(z, u)$ and $B(z, u)$
are the OBGFs associated with $\cal A$ and $\cal B$, respectively, where $z$
marks size and $u$ marks a parameter, then
$A(z, u)+B(z, u)$ is the OBGF associated with $\cal A +\cal B$,
$A(z, u)B(z, u)$ is the OBGF associated with $\cal A \times \cal B$, and
$\displaystyle{1\over 1-A(z, u)}$ is the OBGF associated with $SEQ({\cal A})$.
Proof. Omitted.
Leaves in binary trees.
What proportion of the internal nodes in a binary tree of size $N$
have two external children? Such nodes are called
leaves. You can check that the
total numbers of such nodes for $N=0,1,2,3,$ and $4$ are $0,1,2,6,$ and $20$, respectively.
Dividing by the Catalan numbers, the associated proportions are
$0,1,1,6/5$ and $10/7$. In terms of the BGF
$$T(z, u) = \sum_{t\in\cal T}z^{|t|}u^{\hbox{leaves}(t)}$$
the following are the coefficients of $z^0$,$z^1$, $z^2$, $z^3$, $z^4$, respectively,
and are reflected directly in the trees in Figure 5.2:
$$u^0$$
$$u^1$$
$$u^1+ u^1$$
$$u^1+ u^1+ u^2+ u^1+ u^1$$
$$u^1+ u^1+u^2+u^1+u^1+u^2+u^2+u^1+u^1+u^2+u^1+u^1+u^2+u^2.$$
Adding these terms, we know that
$$T(z, u) = 1 + z^1u + 2z^2u + z^3(4u + u^2) + z^4(8u + 6u^2) + \ldots\,.$$
Checking small values, we find that
$$T(z, 1) = 1 + z^1 + 2z^2 + 5z^3 + 14z^4 + \ldots$$
and
$$T_u(z, 1) = z^1 + 2z^2 + 6z^3 + 20z^4 + \ldots$$
as expected.
To derive a GF equation with the symbolic method, we add ${\cal Z}_{\bullet}$ to both sides
of the standard recursive construction to get
$${\cal T} + {\cal Z}_{\bullet}= {\cal E} + {\cal Z}_{\bullet} + {\cal Z}_{\bullet}\times{\cal T} \times{\cal T}.$$
This gives us a way to mark leaves (by using the BGF $zu$ for the $ {\cal Z}_{\bullet}$ term on the right)
and to balance the equation for the tree of size 1.
Applying Theorem 5.3 (using the BGF $z$ for the ${\cal Z}_{\bullet}$ term
on the left and the ${\cal Z}_{\bullet}$ factor on the rightmost term, since neither corresponds to
a leaf)
immediately gives the functional equation
$$T(z, u) + z=1+zu+zT(z, u)^2.$$
Setting $u=1$ gives the OGF for the Catalan numbers as expected
and differentiating with respect to $u$ and evaluating at $u=1$ gives
$$\eqalign{
T_u(z, 1) &= z + 2zT(1,z)T_u(z, 1)\cr
&= {z\over 1- 2zT(z, 1)}\cr
&= {z\over\sqrt{1-4z}}.\cr
}$$
Thus, the average number of internal nodes
with both nodes external in a binary tree of size $n$ is
$${[z^n]\displaystyle{z\over\sqrt{1-4z}}\over\displaystyle{1\over n+1}{2n\choose n}}={\displaystyle{2n-2\choose n-1}\over\displaystyle{1\over n+1}{2n\choose n}}={(n+1)n\over2(2n-1)}$$
which tends to $n/4$ in the limit. About $1/4$ of the internal nodes in
a binary tree are leaves.
5.5 Generating Function Coefficient Asymptotics
Theorem 5.5 (Radius-of-convergence transfer theorem)
Let $f(z)$ have radius of convergence strictly larger
than 1 and assume that $f(1)\ne0$. For any
real $\alpha\not\in\{0,-1,-2,\ldots\}$, there holds
$$[z^n] {f(z)\over (1-z)^\alpha}\sim f(1) {n+\alpha-1\choose
n}\sim {f(1)\over \Gamma(\alpha)}n^{\alpha-1}.$$
Proof.
Let $f(z)$ have
radius of convergence $>r$, where $r>1$.
We know that $f_n\equiv[z^n]f(z)=O(r^{-n})$
from radius-of-convergence bounds, and in particular the sum $\sum_n f_n$
converges to $f(1)$ geometrically fast.
It is then a simple matter to analyze
the convolution:
$$\eqalign{
[z^n] {f(z)\over(1-z)^{\alpha}}
&=f_0 {n+\alpha-1\choose n}+f_1 {n+\alpha-2\choose n-1}
+\cdots+f_n {\alpha-1\choose 0}\cr
&={n+\alpha-1\choose n}\Bigl(f_0+f_1{n\over n+\alpha-1}\cr
&\qquad\qquad+f_2{n(n-1)\over (n+\alpha-1)(n+\alpha-2)}\cr
&\qquad\qquad+f_3{n(n-1)(n-2)\over (n+\alpha-1)(n+\alpha-2)(n+\alpha-3)}+\cdots\Bigr).\cr
}
$$
The term of index $j$ in this sum is
$$f_j {n(n-1)\cdots(n-j+1)\over (n+\alpha-1)(n+\alpha-2)\cdots(n+\alpha-j)},$$
which tends to~$f_j$ when $n\to+\infty$. From this, we deduce
that
$$[z^n] f(z)(1-z)^{-\alpha}\sim {n+\alpha-1\choose n}
(f_0+f_1+\cdots f_n)\sim
f(1){n+\alpha-1\choose n},$$
since the partial sums $f_0+\cdots+f_n$ converge to $f(1)$
geometrically fast. The approximation to the binomial coefficient
follows from the Euler-Maclaurin formula.
In general, coefficient asymptotics are determined by the
behavior of the GF near where it diverges.
When the radius of convergence is not 1, we can rescale to a function
for which the theorem applies. Such rescaling always introduces a
multiplicative exponential factor.
Corollary.
Let $f(z)$ have radius of convergence strictly larger
than $\rho$ and assume that $f(\rho)\ne0$. For any
real $\alpha\not\in\{0,-1,-2,\ldots\}$, there holds
$$[z^n] {f(z)\over (1-z/\rho)^\alpha}\sim {f(\rho) \over \Gamma(\alpha)}\rho^{-n} n^{\alpha-1}.$$
Proof.Change variables from $z$ to $z/\rho$ in Theorem 5.5.
While it has limitations,
Theorem 5.5 (and its corollary) is effective for extracting coefficients from many of
the GFs that we encounter in this book. It is an outstanding example of the analytic
transfer theorems that comprise the second phase of analytic combinatorics.
Generalized derangements.
We have used the symbolic method to show that the probability
that a given permutation has no cycles of length less than or equal to $M$ is
is $[z^N]P^*_{>M}(z)$, where
$$P^*_{>M}(z) = {e^{-z - z^2\!/2 \ldots - z^M\!/M}\over 1-z}$$
From this expression, Theorem 5.5 gives
immediately
$$[z^N]P^*_{>M}(z)\sim{1\over e^{H_M}}.$$
One of the prime tenets of
analytic combinatorics is that detailed calculations are often
not necessary, as general transfer theorems can provide accurate results
for a great variety of combinatorial classes, as they did in this case.
Catalan numbers and leaves in binary trees.
Similarly, the corollary to Theorem 5.5 immediately provides a transfer from the Catalan GF
$$T(z) = {1 - \sqrt{1 -4z}\over 2}$$
to the asymptotic form of its coefficients: Ignore the constant term, and
take $\alpha = -1/2$ and $f(z) = -1/2$ to get the asymptotic result
$$T_N\sim{4^N\over N\sqrt{\pi N}}.$$
Also, we have a simpler conclusion to the above proof that the average number of leaves in binary trees is $\sim N/4$, since the corollary to Theorem 5.5 immediately gives
$$[z^N] {z\over\sqrt{1 -4z}} = {4^N\over 4\sqrt{\pi N}}$$
With results such as Theorem 5.5, the asymptotic form of coefficients is directly
"transferred" from elements like $(1-z)^{-\alpha}$ (called
"singular elements")
that play a role very similar to partial fraction elements in the
analysis of rational functions.
And deeper mathematical truths are at play.
Theorem 5.5 is only the simplest of a whole set of similar results originating with
Darboux in the last century and further developed by Polya, Szego, Bender, and others. These methods are discussed in
detail in Analytic Combinatorics by Flajolet and Sedgewick.
Unlike what we could do here,
their full development requires
the theory of functions of a complex variable.
This approach to asymptotics is called singularity analysis.
Selected Exercises
5.1
How many bitstrings of length $N$ have no 000?5.3
Let $\cal U$ be the set of binary trees with the size of a tree defined to be the total number of nodes (internal plus external), so that the generating function for its counting sequence is $U(z) = z + z^3 + 2 z^5 + 5z^7 + 14z^9 + \ldots\,\,$. Derive an explicit expression for $U(z)$.5.7
Derive an EGF for the number of permutations whose cycles are all of odd length.5.15
Find the average number of internal nodes in a binary tree of size $N$ with both children internal.5.16
Find the average number of internal nodes in a binary tree of size $N$ with one child internal and one child external.5.23
Show that the probability that all of the cycles are of odd length in a random permutation of length $N$ is $\sim 1/\sqrt{\pi N/2}$ (see Exercise 5.7).