Lecture 31: Trees

In this lecture, we will study trees:

Basic Definition
Properties of Trees with some proofs.
Pruefer Sequences: Representing and Counting Labelled Trees using Clayley's Theorem.
If time permits, counting rooted/unlabelled trees using Catalan Numbers.

Trees

You have most probably studied rooted trees and binary search trees as a data-structure for organizing lists of numbers and efficiently performing insertions and deletions. However a tree is a useful structure in many other parts of CS:

Game trees in AI (how can we make computers play games like chess, go,…).
Decision trees in ML.
Spanning trees in network routing.
Parse trees for compilers.
The list goes on and on

Tree (definition)

Let us start with the simplest kind of trees: unrooted and undirected trees.

Definition

A (unrooted) tree is an undirected graph G:(V,E) such that

is fully connected (the entire graph is a maximally connected component),
is acyclic (there are no cycles in ).

A rooted tree is a fully connected, acyclic graph with a special node v in V that is called the root of the tree. You may have studied rooted trees in your data structures class. With a root, it is possible to define a parent and children for each node. But without a root, we will regard the tree simply as a connected, acyclic graph.

Examples

Here are some examples of unrooted trees:

Non-Examples

The following graph is not a tree. It has a cycle:

Here is another example that has more than one maximal connected components and is not a tree:

Properties of Trees

Leaves of a Tree

A leaf of an unrooted tree is a node that has a degree . Let us write down the leaves of the following tree examples:

Leaves are {1,6,7,8,9} .

Leaves are {1,3,4,5,6,7,8,9} .

Leaves are {3,4,5} .

Claim Every tree has a leaf.

Proof

If the tree just has one node, then it is trivially a leaf. If the tree has two nodes connected by an edge, both nodes are leaves. Let us focus on trees with three or more nodes.

Let us assume that a tree that does not have a leaf exists. Therefore for every node in this tree mathsf{degree}(v) geq 2 . We are going to show that has a cycle, yielding a contradiction with the assumption that it is a tree.

Start from any node v_0 in the tree and do a walk as follows:

Take any edge out for to reach .
For any node , take an edge other than the $v_{i-1} --- v_i$ that we took to enter . Since $mathsf{degree}(v_i) geq 2$ , such a vertex is always available.

Note that the walk above can be continued for arbitrarily many steps. However, since number of vertices is finite, the walk repeats a vertex. Let for some i geq 0 , $v_i = v_{i+k}$ . We can now conclude the existence of a cycle with v_i in it. Therefore is not a tree, yielding a contradiction.

Thus, we have concluded that every tree has a leaf.

Number of Edges

Claim A tree with nodes has n-1 edges.

Proof

Proof is by weak induction on the number of nodes .

Base Case: Take any tree with node. There is just one such tree and it has 1-1=0 edges.

Inductive Hypothesis: Let us assume that all trees with nodes have n-1 edges. We will show that all trees with n+1 nodes have edges.

Take some tree $T_{n+1}$ with n+1 nodes. It must have a leaf . Removing the leaf gives us a tree with nodes that must have n-1 edges in it. The leaf itself was connected to the rest of the tree by one edge. Therefore $T_{n+1}$ has edges.

Number of Leaves

Claim Any tree has at least two leaves.

We will prove this in class by the following argument. We already know that any tree has at least one leaf.

Proof

Let us assume that there is a tree with n > 1 nodes and exactly one leaf .

Therefore all nodes other than have degree .
Sum of degrees of all nodes .
However, we know that sum of degrees of all nodes = 2 * number of edges.
number of edges $geq lceil{frac{2n -1}{2}} rceil geq n$ .
However a tree has precisely edges.
This leads to a contradiction.

Paths in Trees

Claim Let and be two nodes in a tree. There is precisely one path from to in the tree.

Argument in class Since the tree is connected component itself, there has to be at least one path from to .

We will argue that having two distinct paths will necessarily imply that the tree has a cycle which will lead to a contradiction.

Counting Labeled Trees and Pruefer Sequences

We will now present a very elegant method of representing labeled trees and count them using Pruefer sequences.

Labelled vs. Unlabelled Trees

We distinguish between instances of trees where the names of the vertices matter and instances where all that matters are the connection between vertices. The former class of trees are called labelled trees and the latter are called unlabelled trees.

Take the following three trees:

They have the same number of nodes and are ‘‘isomorphic+ to each other. The tree on the left is obtained by substituting with everywhere. Similarly the tree on the right is obtained by substituting with for the tree in the middle.

Trees	Isomorphism
1 vs. 2	1 <-> 4
2 vs. 3	2 <-> 6
1 vs. 3	1 <-> 4, 2 <-> 6

We can view a tree in two ways:

Labelled Tree: The labels on the nodes actually matter and two isomorphic trees with different labels are actually different trees. In particular, the above example represents different labelled trees.
Unlabelled Trees: The labels do not matter. Isomorphic trees count as the same. In particular, all trees in the above example are the same.

All the examples above represent the following unlabelled tree:

Pruffer Sequences

Given a labelled tree , we can represent it by a sequence of numbers called its Pruefer Sequence or its Pruefer Code. Let us assume that the labels for the nodes of the tree are numbers from to .

Basic idea behind a Pruefer Sequence is to keep removing the leaves one at a time and write down a number corresponding to each leaf that we remove:

Take the lowest numbered leaf in the tree and remove it.
Add the number of the node the removed leaf is connected to the Pruefer sequence.
Repeat until just two vertices remain.

Example

Let us write down the Pruffer sequence for this tree:

To start with, the smallest numbered leaf is . We remove from the tree and add to the Pruefer sequence. We get the tree:

The smallest number leaf is now , which is connected to . We add to the Pruefer sequence, to get the tree:

The smallest numbered leaf is a , which is connected to . Therefore, we add to the Pruefer sequence.

To complete out the process, we get the Pruefer sequence: 2,1,2

Example-2

Let us try other trees and write down their Pruefer sequences.

The Pruefer sequence is 2,2,2,2,2,2,2 .

Example-3

The Pruefer sequence is 2, 5,5,4,3,3,2 .

Reconstructing Trees From Pruefer Code

Let us now figure out what a valid Pruefer Sequence is and how to reconstruct a tree uniquely from its Pruefer Sequence.

Pruefer Sequences

For a tree with nodes assumed to be labelled 1,2,...n , its Pruefer sequence has

numbers in the sequence.
Each number can be from .
In fact, any sequence of numbers from is possible.

Reconstruction Algorithm: Example

Let us try and reconstruct a tree from its given Pruefer sequence: 2, 5,5,4,3,3,2

Q1 How many nodes does the tree have? 9.

The algorithm looks at the smallest number that we have not yet seen so far in the Pruefer sequence.

The smallest number that does not appear so far in the sequence is a .
Therefore we conclude that must be the first leaf removed and that it is connected to a .

The remaining sequence is 5,5,4,3,3,2 . We remember that has been resolved: E={(1,2)} .

The smallest number that does not appear is .
Therefore, we conclude that must have been a leaf at this stage and is connected to .

The remaining sequence is 5,4,3,3,2 . We remember that 1,6 have been resolved: E={(1,2), (5,6)} .

Smallest number is a and it must be connected to a .

The remaining sequence is 4,3,3,2 . We remember that 1,6,7 have been resolved E={(1,2), (5,6), (5,7) } .

Smallest number is a and it must be connected to a .

The remaining sequence is 3,3,2 . 1,6,7,5 resolved and E={(1,2), (5,6), (5,7), (4,5) }

Here is the complete run of the algorithm

Remaining Sequence	Resolved Nodes	Smallest number not seen so far	Edge Inferred
2,5,5,4,3,3,2	-	1	(2,1), (1,2)
5,5,4,3,3,2	1	6	(6,5), (5,6)
5,4,3,3,2	1,6	7	(5,7), (7,5)
4,3,3,2	1,6,7	5	(5,4), (4,5)
3,3,2	1,5,6,7	4	(4,3), (3,4)
3,2	1,4,5,6,7	8	(8,3), (3,8)
2	1,4,5,6,7,8	3	(3,2), (2,3)
	1,3,4,5,6,7,8	-	(2,9), (9,2)

The last step is special since we have two nodes unseen, the Pruefer sequence is exhausted and we conclude that they must form an edge.

Clayley's Formula

There are exactly $n^{n-2}$ distinct labelled trees on nodes.