MST (Minimum Spanning Tree) Discussion
This discussion complements Assignment #6.
Definition
Let G = (V,E) be an undirected graph. G is composed of a set of vertices and a set of edges. Let n be the number of vertices and m be the number of edges of G, and let the vertices be designated by the numbers, 1,2,...,n. A weighted graph is an undirected graph in which each edge e also has an associated weight, which we will assume is a positive integer. So we will use the triple e = (w,i,j) to designate an edge, where w is the weight of e, i is the first vertex of e, and j is the second vertex of e.
If there is a path between every two vertices i and j, G is called connected. Otherwise G is disconnected.
Input Format
Graphs for input to the program are specified in text files. An illustration of how to read integers from a text file is contained in ReadInts.java in the examples directory. Graph files can be created by hand using a text editor. There are also two programs called MakeGraph.java and MakeGraph2.java in the examples directory which can be used to create sparse and dense graphs of arbitrary sizes. The input to specify a graph is formatted as follows: The first line contains two integers, n and m, which give the number of vertices in the graph and the number of edges, respectively. Each line that follows specifies a single edge. Each edge is defined by three integers which give, in order, the weight, the first vertex, and the second vertex of a given edge. Thus the input for the graph shown in Fig 1 would be as follows:
7 12 8 1 2 12 1 6 5 1 5 4 2 3 2 2 4 6 5 4 4 5 6 3 4 3 6 6 4 7 7 3 5 7 4 1 6 7
Edge List Representation
A graph G can be represented by an edge list. An edge list is an array, which we will call EdgeList, containing m items so that each item corresponds to an edge. Each item will have three data members of type int: the weight, the first vertex and the second vertex of the edge. So in our case each item will be an instance of class Edge. The edge list of the graph in Fig 1 is illustrated below.
| Weight | 8 | 12 | 5 | 4 | 2 | 6 | 4 | 3 | 6 | 7 | 5 | 1 |
| First | 1 | 1 | 1 | 2 | 2 | 5 | 5 | 4 | 6 | 7 | 7 | 6 |
| Second | 2 | 6 | 5 | 3 | 4 | 4 | 6 | 3 | 4 | 3 | 4 | 7 |
Adjacency List Representation
A graph G can also be represented by an adjacency list. This representation is slightly more complex than an edge list, but is more useful in most applications. For each vertex i (1 ≤ i ≤ n), all edges incident to vertex i are stored in a list. Each item in list(i) corresponds to an edge incident to i and has three integer data members: the weight, the first vertex, and the second vertex of the edge. So in our case we will have an array ADJ[1..n] of such lists, each list being a queue of edges. So each ADJ[i] is basically a reference to the first item in list(i). Note that an edge e = (w,i,j) will have two items corresponding to it: one in list(i) and another one in list(j).
The adjacency list representation of the graph shown in Fig 1 is given immediately below the figure.
Minimum Spanning Tree (MST)
A spanning tree of G is a tree T in G that contains every node of G. Every connected graph G has a spanning tree. The weight of T is the sum of the weights of all edges in T. If the weight of T is minimum among all spanning trees of G, T is called a minimum spanning tree (MST) of G. In Fig 1, a MST, T, of G is indicated by thick lines. The weight of T is 20.
The MST problem: Given a graph G, find a MST of G.
The following is one of many applications of the MST problem. Suppose that each vertex of G is a computer site. The edges of G are the possible links between the sites. The weight w of an edge e = (w,i,j) is the cost of building the link between the site i and site j. Our goal is to set up the links between the sites so that all computer sites are connected, and the cost of building the links is minimized. To solve this problem, we only need to find a MST of G.
We will discuss two algorithms for solving the MST problem: Kruskal's algorithm and Prim's algorithm. These two algorithms are not complicated. To implement them, however, it is necessary to make several data structures work together.
Kruskal's Algorithm
The edge list representation of G is used by this algorithm. The basic idea is as follows. Let T be a list which will contain the edges of the MST. Initially T is empty. We treat each vertex of G as isolated (namely, each vertex is a connected component). We process the edges of G in the order of increasing weights. When an edge e = (w,i,j) is processed, we look to see if e connects two different connected components. If yes, e is added into T. If no (i.e. the two end vertices i and j are in the same connected component), then we discard e. The following is the outline of the algorithm.
Kruskal's Algorithm: 1. initialize an empty list (queue) T; 2. declare each vertex i (1 ≤ i ≤ n) of G to be a connected component containing only i itself; 3. sort EdgeList according to Weight by increasing order; 4. process the edges in EdgeList (in increasing Weight order) one-by-one: 4.1 let e=(w,i,j) be the edge being processed; 4.2 if i and j are in the same connected component; do nothing; 4.3 if i and j are not in the same connected component then 4.3.a put e=(w,i,j) into T; 4.3.b merge the connected component containing i and the connected component containing j into a bigger connected component. 5. T is a MST of G; output T.
Note: The above algorithm can also be used to test if G is connected or not as follows. At step 5, we count the number N of edges in T. If N = n-1, then G is connected and T is a MST of G. If N < n-1, then G is not connected.
When we run Kruskal's algorithm on the graph G shown in Fig 1, the following edges are added into T in this order:
1 6 7 2 2 4 3 4 3 4 5 6 5 1 5 5 7 4
In addition to an edge list, we also need the following data structures for this algorithm:
A) A list (queue) T containing the edges in the MST.
B) A structure, called a UnionFindSet (the Disjoint Set ADT in Weiss), so that the steps 4.2, 4.3 and 4.3.b can be done efficiently. During the execution of Kruskal's algorithm, the vertex set 1,2,...,n of G is partitioned into disjoint connected components. Using a UnionFindSet structure helps us keep track these connected components.
UnionFindSet can be written as a simple class. The basic data member of the class is an array A[1..n] (which means the array must be dimensioned as n + 1) in which each cell A[i] corresponds to vertex i of G. The class UnionFindSet needs only three methods, one of which can be made the constructor for the class:
UnionFindSet(); int Find(int i); void Union(int a, int b);
The way a UnionFindSet works is that the vertices of a connected component of G are arranged conceptually into a tree, so that the root of the tree is one of the vertices (this vertex serves to "label" the tree). If vertex i is the root of a tree we indicate this by letting A[i] = i. Otherwise vertex i is not the root of any tree, and the value contained in A[i] denotes the index corresponding to the parent of i in some tree.
At step 4.2, above, we declare each vertex i (1 ≤ i &le: n) to be a connected component. Each component is then a tree with only one node i. This is accomplished by simply making A[i] = i, which can be done by the class constructor:
UnionFindSet()
{ for (int i=1; i<=n; i++) A[i]=i;}
In steps 4.2 and 4.3, we need to find out whether i and j are in the same connected component or not. So we need to find the name of the connected component containing each vertex. This is done by the Find(i) function. It returns the name or label of the connected component containing i, (i.e., the root vertex of the tree corresponding to the connected component).
int Find(int i)
{ while (A[i] != i) { i = A[i];}
return i;
}
Two vertices i and j are in the same connected component iff Find(i) == Find(j).
In step 4.3.b, we need to merge two connected components into a bigger one. This is done by the Union(a,b) function. This function assumes that a and b are in different connected components and that both a and b are the roots of their corresponding trees (viz., a = A[a], b = A[b]; but a != b.) The merging of the two connected components can be done by simply setting the parent of a to be b, as shown below:
void Union(int a, int b)
{ A[a]=b;
}
Note: You may want to improve on this approach by using union-by-rank
and/or path compression as discussed in chapter 8 of Weiss.
Prim's Algorithm
The adjacency list representation, ADJ, is used for this algorithm.
The basic idea of the algorithm is as follows. T is a list (queue) which will contain the edges of the MST. At the beginning, T is empty. We maintain a subset S of vertices V. At the beginning, S only contains one vertex r (usually r = 1). During the execution of the algorithm, we add one vertex into S and one edge into T at a time. Intuitively, we are growing a MST from a single vertex r. S is the set of the vertices in the current tree.
We maintain a set CrossEdges of edges. This set contains all edges e = (w,i,j) such that i is an element of S and j is an element of V-S. We find the edge e = (w,i,j) in CrossEdges with minimum weight w (and remove it from CrossEdges); add e into T and add j into S. In this way, all edges (w,j,k) in list(j) such that k is not in S are inserted into CrossEdges.
As can be seen, the set CrossEdges requires the following operations: Insert; FindMin; and RemoveMin. Since these are the operations provided by a priority queue, we should implement CrossEdges as a binary heap.
The following is the outline of the algorithm:
Prim's Algorithm: 1. initialize an empty list (queue) T; 2. initialize an empty binary heap CrossEdges; 3. initialize an empty set S; 4. put r into S; 5. insert all edges in ADJ[r] into CrossEdges; 6. while (CrossEdges is not empty) 6.1 Let e=(w,i,j) be the edge in CrossEdges with minimum weight; 6.2 Remove e from CrossEdges; 6.3 if j is in S; do nothing; 6.4 if j is not in S; do: 6.3.a put j into S; 6.3.b put e=(w,i,j) into T; 6.3.c for each edge e=(w,j,k) in ADJ[j] if k is not in S, Insert (w,j.k) into CrossEdges; 7. T is a MST of G;
Note 1: In this implementation, CrossEdges will also contain some edges between the vertices in S. These edges are ignored by the algorithm (at step 6.3).
Note 2: Similar to Kruskal's algorithm, the above algorithm can also be used to test if G is connected or not as follows: at step 7, count the number N of edges in T. If N = n-1, then G is connected and T is a MST of G. If N < n-1, then G is not connected.
When we run Prim's algorithm on the graph G shown in Fig 1, the following edges are added into T in this order:
5 1 5 4 5 6 1 6 7 5 7 4 2 4 2 3 4 3
The following data structures are needed by this algorithm.
A) The adjacency list ADJ[1..n].
B) The queue T.
C) A binary heap, CrossEdges.
D) A data structure, called BitSet, to represent the set S.
The basic data member of BitSet is an array B[1..n] of boolean. The entry B[i] corresponds to the vertex i. If B[i] = true then i is in S. If B[i] = false, then i is not in S.
We need the following functions for BitSet:
BitSet()
// a constructor to make an empty set S
{ for (int i=1; i<= n; i++) B[i]=false;
}
void PutIn(int i)
// put i into S;
{ B[i]=true;
}
int IsIn(int i)
// test if i is in S or not; if i is in S, return true, if not return false
{ return B[i];
}
Figure 1

Updated November 26, 2007