Lecture 31: Trees

In this lecture, we will study trees:

  • Basic Definition

  • Properties of Trees with some proofs.

  • Pruefer Sequences: Representing and Counting Labelled Trees using Clayley's Theorem.

  • If time permits, counting rooted/unlabelled trees using Catalan Numbers.

Trees

You have most probably studied rooted trees and binary search trees as a data-structure for organizing lists of numbers and efficiently performing insertions and deletions. However a tree is a useful structure in many other parts of CS:

  • Game trees in AI (how can we make computers play games like chess, go,…).

  • Decision trees in ML.

  • Spanning trees in network routing.

  • Parse trees for compilers.

  • The list goes on and on

Tree (definition)

Let us start with the simplest kind of trees: unrooted and undirected trees.

Definition

A (unrooted) tree G is an undirected graph G:(V,E) such that

  • G is fully connected (the entire graph is a maximally connected component),

  • G is acyclic (there are no cycles in G).

A rooted tree G is a fully connected, acyclic graph with a special node v in V that is called the root of the tree. You may have studied rooted trees in your data structures class. With a root, it is possible to define a parent and children for each node. But without a root, we will regard the tree simply as a connected, acyclic graph.

Examples

Here are some examples of unrooted trees:

100300100

Non-Examples

The following graph is not a tree. It has a cycle:

100

Here is another example that has more than one maximal connected components and is not a tree:

100

Properties of Trees

Leaves of a Tree

A leaf of an unrooted tree is a node that has a degree 1. Let us write down the leaves of the following tree examples:

100

Leaves are {1,6,7,8,9}.

300

Leaves are {1,3,4,5,6,7,8,9}.

100

Leaves are {3,4,5}.

Claim Every tree has a leaf.

Proof

If the tree just has one node, then it is trivially a leaf. If the tree has two nodes connected by an edge, both nodes are leaves. Let us focus on trees with three or more nodes.

Let us assume that a tree T that does not have a leaf exists. Therefore for every node v in this tree mathsf{degree}(v) geq 2. We are going to show that T has a cycle, yielding a contradiction with the assumption that it is a tree.

Start from any node v_0 in the tree and do a walk as follows:

  • Take any edge out for v_0 to reach v_1.

  • For any node v_i, take an edge other than the v_{i-1} --- v_i that we took to enter v_i. Since mathsf{degree}(v_i) geq 2, such a vertex is always available.

Note that the walk above can be continued for arbitrarily many steps. However, since number of vertices is finite, the walk repeats a vertex. Let for some i geq 0, v_i = v_{i+k}. We can now conclude the existence of a cycle with v_i in it. Therefore T is not a tree, yielding a contradiction.

Thus, we have concluded that every tree has a leaf.

Number of Edges

Claim A tree with n nodes has n-1 edges.

Proof

Proof is by weak induction on the number of nodes n.

Base Case: Take any tree with 1 node. There is just one such tree and it has 1-1=0 edges.

Inductive Hypothesis: Let us assume that all trees with n nodes have n-1 edges. We will show that all trees with n+1 nodes have n edges.

Take some tree T_{n+1} with n+1 nodes. It must have a leaf v. Removing the leaf gives us a tree with n nodes that must have n-1 edges in it. The leaf itself was connected to the rest of the tree by one edge. Therefore T_{n+1} has n edges.

Number of Leaves

Claim Any tree has at least two leaves.

We will prove this in class by the following argument. We already know that any tree has at least one leaf.

Proof

Let us assume that there is a tree with n > 1 nodes and exactly one leaf v.

  • Therefore all nodes other than v have degree geq 2.

  • Sum of degrees of all nodes geq  1 + 2 (n-1) = 2 n -1.

  • However, we know that sum of degrees of all nodes = 2 * number of edges.

  • number of edges geq lceil{frac{2n -1}{2}} rceil geq n.

  • However a tree has precisely n-1 edges.

  • This leads to a contradiction.

Paths in Trees

Claim Let u and v be two nodes in a tree. There is precisely one path from u to v in the tree.

Argument in class Since the tree is connected component itself, there has to be at least one path from u to v.

We will argue that having two distinct paths will necessarily imply that the tree has a cycle which will lead to a contradiction.

Counting Labeled Trees and Pruefer Sequences

We will now present a very elegant method of representing labeled trees and count them using Pruefer sequences.

Labelled vs. Unlabelled Trees

We distinguish between instances of trees where the names of the vertices matter and instances where all that matters are the connection between vertices. The former class of trees are called labelled trees and the latter are called unlabelled trees.

Take the following three trees:

200200200

They have the same number of nodes and are ‘‘isomorphic+ to each other. The tree on the left is obtained by substituting 1 with 4 everywhere. Similarly the tree on the right is obtained by substituting 2 with 6 for the tree in the middle.

Trees Isomorphism
1 vs. 2 1 <-> 4
2 vs. 3 2 <-> 6
1 vs. 3 1 <-> 4, 2 <-> 6

We can view a tree in two ways:

  • Labelled Tree: The labels on the nodes actually matter and two isomorphic trees with different labels are actually different trees. In particular, the above example represents different labelled trees.

  • Unlabelled Trees: The labels do not matter. Isomorphic trees count as the same. In particular, all trees in the above example are the same.

All the examples above represent the following unlabelled tree:

200

Pruffer Sequences

Given a labelled tree T, we can represent it by a sequence of numbers called its Pruefer Sequence or its Pruefer Code. Let us assume that the labels for the nodes of the tree are numbers from 1 to n.

Basic idea behind a Pruefer Sequence is to keep removing the leaves one at a time and write down a number corresponding to each leaf that we remove:

  • Take the lowest numbered leaf in the tree and remove it.

  • Add the number of the node the removed leaf is connected to the Pruefer sequence.

  • Repeat until just two vertices remain.

Example

Let us write down the Pruffer sequence for this tree:

100

To start with, the smallest numbered leaf is 3. We remove 3 from the tree and add 2 to the Pruefer sequence. We get the tree:

100

The smallest number leaf is now 4, which is connected to 1. We add 1 to the Pruefer sequence, to get the tree:

50

The smallest numbered leaf is a 1, which is connected to 2. Therefore, we add 2 to the Pruefer sequence.

To complete out the process, we get the Pruefer sequence: 2,1,2

Example-2

Let us try other trees and write down their Pruefer sequences.

250

The Pruefer sequence is 2,2,2,2,2,2,2.

Example-3

100

The Pruefer sequence is 2, 5,5,4,3,3,2.

Reconstructing Trees From Pruefer Code

Let us now figure out what a valid Pruefer Sequence is and how to reconstruct a tree uniquely from its Pruefer Sequence.

Pruefer Sequences

For a tree with n nodes assumed to be labelled 1,2,...n, its Pruefer sequence has

  • n-2 numbers in the sequence.

  • Each number can be from 1...n.

  • In fact, any sequence of n-2 numbers from 1...n is possible.

Reconstruction Algorithm: Example

Let us try and reconstruct a tree from its given Pruefer sequence: 2, 5,5,4,3,3,2

Q1 How many nodes does the tree have? 9.

The algorithm looks at the smallest number that we have not yet seen so far in the Pruefer sequence.

  • The smallest number that does not appear so far in the sequence is a 1.

  • Therefore we conclude that 1 must be the first leaf removed and that it is connected to a 2.

The remaining sequence is 5,5,4,3,3,2. We remember that 1 has been resolved: E={(1,2)}.

  • The smallest number that does not appear is 6.

  • Therefore, we conclude that 6 must have been a leaf at this stage and is connected to 5.

The remaining sequence is 5,4,3,3,2. We remember that 1,6 have been resolved: E={(1,2), (5,6)}.

  • Smallest number is a 7 and it must be connected to a 5.

The remaining sequence is 4,3,3,2. We remember that 1,6,7 have been resolved E={(1,2), (5,6), (5,7) }.

  • Smallest number is a 5 and it must be connected to a 4.

The remaining sequence is 3,3,2. 1,6,7,5 resolved and E={(1,2), (5,6), (5,7), (4,5) }

Here is the complete run of the algorithm

Remaining Sequence Resolved Nodes Smallest number not seen so far Edge Inferred
2,5,5,4,3,3,2 - 1 (2,1), (1,2)
5,5,4,3,3,2 1 6 (6,5), (5,6)
5,4,3,3,2 1,6 7 (5,7), (7,5)
4,3,3,2 1,6,7 5 (5,4), (4,5)
3,3,2 1,5,6,7 4 (4,3), (3,4)
3,2 1,4,5,6,7 8 (8,3), (3,8)
2 1,4,5,6,7,8 3 (3,2), (2,3)
- 1,3,4,5,6,7,8 - (2,9), (9,2)

The last step is special since we have two nodes unseen, the Pruefer sequence is exhausted and we conclude that they must form an edge.

Clayley's Formula

There are exactly n^{n-2} distinct labelled trees on n nodes.