5.   Analytic Combinatorics


This chapter introduces analytic combinatorics, a modern approach to the study of combinatorial structures of the sort that we encounter frequently in the analysis of algorithms. The approach is predicated on the idea that combinatorial structures are typically defined by simple formal rules that are the key to learning their properties. One eventual outgrowth of this observation is that a relatively small set of transfer theorems ultimately yields accurate approximations of the quantities that we seek. Figure 5.1 gives an general overview of the process. Generating functions are the central objects of study in analytic combinatorics. In the first place, we directly translate formal definitions of combinatorial objects into definitions of generating functions that enumerate objects or describe their properties. In the second place, we use classical mathematical analysis to extract estimates of generating function coefficients.


5.1 Formal Basis


5.2 Symbolic Method for Unlabelled Classes


5.3 Symbolic Method for Labelled Classes


5.4 Symbolic Method for Parameters


Theorem 5.3 (Symbolic method for unlabelled class OBGFs) Let $\cal A$ and $\cal B$ be unlabelled classes of combinatorial objects. If $A(z, u)$ and $B(z, u)$ are the OBGFs associated with $\cal A$ and $\cal B$, respectively, where $z$ marks size and $u$ marks a parameter, then


   $A(z, u)+B(z, u)$ is the OBGF associated with $\cal A +\cal B$,


   $A(z, u)B(z, u)$ is the OBGF associated with $\cal A \times \cal B$, and


   $\displaystyle{1\over 1-A(z, u)}$ is the OBGF associated with $SEQ({\cal A})$.


Proof. Omitted.


Leaves in binary trees. What proportion of the internal nodes in a binary tree of size $N$ have two external children? Such nodes are called leaves. You can check that the total numbers of such nodes for $N=0,1,2,3,$ and $4$ are $0,1,2,6,$ and $20$, respectively. Dividing by the Catalan numbers, the associated proportions are $0,1,1,6/5$ and $10/7$. In terms of the BGF $$T(z, u) = \sum_{t\in\cal T}z^{|t|}u^{\hbox{leaves}(t)}$$ the following are the coefficients of $z^0$,$z^1$, $z^2$, $z^3$, $z^4$, respectively, and are reflected directly in the trees in Figure 5.2: $$u^0$$ $$u^1$$ $$u^1+ u^1$$ $$u^1+ u^1+ u^2+ u^1+ u^1$$ $$u^1+ u^1+u^2+u^1+u^1+u^2+u^2+u^1+u^1+u^2+u^1+u^1+u^2+u^2.$$ Adding these terms, we know that $$T(z, u) = 1 + z^1u + 2z^2u + z^3(4u + u^2) + z^4(8u + 6u^2) + \ldots\,.$$ Checking small values, we find that $$T(z, 1) = 1 + z^1 + 2z^2 + 5z^3 + 14z^4 + \ldots$$ and $$T_u(z, 1) = z^1 + 2z^2 + 6z^3 + 20z^4 + \ldots$$ as expected. To derive a GF equation with the symbolic method, we add ${\cal Z}_{\bullet}$ to both sides of the standard recursive construction to get $${\cal T} + {\cal Z}_{\bullet}= {\cal E} + {\cal Z}_{\bullet} + {\cal Z}_{\bullet}\times{\cal T} \times{\cal T}.$$ This gives us a way to mark leaves (by using the BGF $zu$ for the $ {\cal Z}_{\bullet}$ term on the right) and to balance the equation for the tree of size 1. Applying Theorem 5.3 (using the BGF $z$ for the ${\cal Z}_{\bullet}$ term on the left and the ${\cal Z}_{\bullet}$ factor on the rightmost term, since neither corresponds to a leaf) immediately gives the functional equation $$T(z, u) + z=1+zu+zT(z, u)^2.$$ Setting $u=1$ gives the OGF for the Catalan numbers as expected and differentiating with respect to $u$ and evaluating at $u=1$ gives $$\eqalign{ T_u(z, 1) &= z + 2zT(1,z)T_u(z, 1)\cr &= {z\over 1- 2zT(z, 1)}\cr &= {z\over\sqrt{1-4z}}.\cr }$$ Thus, the average number of internal nodes with both nodes external in a binary tree of size $n$ is $${[z^n]\displaystyle{z\over\sqrt{1-4z}}\over\displaystyle{1\over n+1}{2n\choose n}}={\displaystyle{2n-2\choose n-1}\over\displaystyle{1\over n+1}{2n\choose n}}={(n+1)n\over2(2n-1)}$$ which tends to $n/4$ in the limit. About $1/4$ of the internal nodes in a binary tree are leaves.





5.5 Generating Function Coefficient Asymptotics



Theorem 5.5 (Radius-of-convergence transfer theorem) Let $f(z)$ have radius of convergence strictly larger than 1 and assume that $f(1)\ne0$. For any real $\alpha\not\in\{0,-1,-2,\ldots\}$, there holds $$[z^n] {f(z)\over (1-z)^\alpha}\sim f(1) {n+\alpha-1\choose n}\sim {f(1)\over \Gamma(\alpha)}n^{\alpha-1}.$$ Proof. Let $f(z)$ have radius of convergence $>r$, where $r>1$. We know that $f_n\equiv[z^n]f(z)=O(r^{-n})$ from radius-of-convergence bounds, and in particular the sum $\sum_n f_n$ converges to $f(1)$ geometrically fast. It is then a simple matter to analyze the convolution: $$\eqalign{ [z^n] {f(z)\over(1-z)^{\alpha}} &=f_0 {n+\alpha-1\choose n}+f_1 {n+\alpha-2\choose n-1} +\cdots+f_n {\alpha-1\choose 0}\cr &={n+\alpha-1\choose n}\Bigl(f_0+f_1{n\over n+\alpha-1}\cr &\qquad\qquad+f_2{n(n-1)\over (n+\alpha-1)(n+\alpha-2)}\cr &\qquad\qquad+f_3{n(n-1)(n-2)\over (n+\alpha-1)(n+\alpha-2)(n+\alpha-3)}+\cdots\Bigr).\cr } $$ The term of index $j$ in this sum is $$f_j {n(n-1)\cdots(n-j+1)\over (n+\alpha-1)(n+\alpha-2)\cdots(n+\alpha-j)},$$ which tends to~$f_j$ when $n\to+\infty$. From this, we deduce that $$[z^n] f(z)(1-z)^{-\alpha}\sim {n+\alpha-1\choose n} (f_0+f_1+\cdots f_n)\sim f(1){n+\alpha-1\choose n},$$ since the partial sums $f_0+\cdots+f_n$ converge to $f(1)$ geometrically fast. The approximation to the binomial coefficient follows from the Euler-Maclaurin formula.




In general, coefficient asymptotics are determined by the behavior of the GF near where it diverges. When the radius of convergence is not 1, we can rescale to a function for which the theorem applies. Such rescaling always introduces a multiplicative exponential factor.


Corollary. Let $f(z)$ have radius of convergence strictly larger than $\rho$ and assume that $f(\rho)\ne0$. For any real $\alpha\not\in\{0,-1,-2,\ldots\}$, there holds $$[z^n] {f(z)\over (1-z/\rho)^\alpha}\sim {f(\rho) \over \Gamma(\alpha)}\rho^{-n} n^{\alpha-1}.$$ Proof.Change variables from $z$ to $z/\rho$ in Theorem 5.5.



While it has limitations, Theorem 5.5 (and its corollary) is effective for extracting coefficients from many of the GFs that we encounter in this book. It is an outstanding example of the analytic transfer theorems that comprise the second phase of analytic combinatorics.


Generalized derangements. We have used the symbolic method to show that the probability that a given permutation has no cycles of length less than or equal to $M$ is is $[z^N]P^*_{>M}(z)$, where $$P^*_{>M}(z) = {e^{-z - z^2\!/2 \ldots - z^M\!/M}\over 1-z}$$ From this expression, Theorem 5.5 gives immediately $$[z^N]P^*_{>M}(z)\sim{1\over e^{H_M}}.$$ One of the prime tenets of analytic combinatorics is that detailed calculations are often not necessary, as general transfer theorems can provide accurate results for a great variety of combinatorial classes, as they did in this case.


Catalan numbers and leaves in binary trees. Similarly, the corollary to Theorem 5.5 immediately provides a transfer from the Catalan GF $$T(z) = {1 - \sqrt{1 -4z}\over 2}$$ to the asymptotic form of its coefficients: Ignore the constant term, and take $\alpha = -1/2$ and $f(z) = -1/2$ to get the asymptotic result $$T_N\sim{4^N\over N\sqrt{\pi N}}.$$ Also, we have a simpler conclusion to the above proof that the average number of leaves in binary trees is $\sim N/4$, since the corollary to Theorem 5.5 immediately gives $$[z^N] {z\over\sqrt{1 -4z}} = {4^N\over 4\sqrt{\pi N}}$$




With results such as Theorem 5.5, the asymptotic form of coefficients is directly "transferred" from elements like $(1-z)^{-\alpha}$ (called "singular elements") that play a role very similar to partial fraction elements in the analysis of rational functions. And deeper mathematical truths are at play. Theorem 5.5 is only the simplest of a whole set of similar results originating with Darboux in the last century and further developed by Polya, Szego, Bender, and others. These methods are discussed in detail in Analytic Combinatorics by Flajolet and Sedgewick. Unlike what we could do here, their full development requires the theory of functions of a complex variable. This approach to asymptotics is called singularity analysis.

Selected Exercises


5.1

How many bitstrings of length $N$ have no 000?


5.3

Let $\cal U$ be the set of binary trees with the size of a tree defined to be the total number of nodes (internal plus external), so that the generating function for its counting sequence is $U(z) = z + z^3 + 2 z^5 + 5z^7 + 14z^9 + \ldots\,\,$. Derive an explicit expression for $U(z)$.


5.7

Derive an EGF for the number of permutations whose cycles are all of odd length.

5.15

Find the average number of internal nodes in a binary tree of size $N$ with both children internal.


5.16

Find the average number of internal nodes in a binary tree of size $N$ with one child internal and one child external.


5.23

Show that the probability that all of the cycles are of odd length in a random permutation of length $N$ is $\sim 1/\sqrt{\pi N/2}$ (see Exercise 5.7).


Review Questions

Q5.1

Give combinatorial constructions for (a) binary strings with no occurrence of 01 (b) binary strings with no occurrence of 11


Q5.2

Give the approximate value (leading term) of (a) $[z^n]{1\over1-3z}\ln{1\over 1-2z}$ (b) $[z^n]{1\over\sqrt{1-3z}}\ln{1\over 1-2z}$


Q5.3

For each of the following combinatorial constructions, indicate whether it describes permutations, binary strings, cycles, derangements, or none of these. $$A = Z+A\times A$$ $$A = SET(CYC_{>1}(Z)$$ $$A = Z+E+Z\star A$$ $$A = CYC_{>0}(Z)$$ $$A = A\times(Z_0+Z_1)$$ $$A = SEQ(Z)$$


Q5.4

(A. Yin) How many ways are there to climb $N$ stairs using strides of either 1 or 3 steps at a time? Give an answer of the form $c_1\cdot c_2^N$ where $c_1$ and $c_2$ are constants (and give approximate values of the constants).