ebook img

Grafalgo - A Library of Graph Algorithms and Supporting Data Structures (revised) PDF

0.26 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Grafalgo - A Library of Graph Algorithms and Supporting Data Structures (revised)

6 1 0 2 n a J 7 ] S Grafalgo - A Library of Graph D . s Algorithms and Supporting c [ Data Structures (revised) 1 v 7 9 Jonathan Turner 5 ([email protected]) 1 0 WUCSE-2016-01 . 1 0 6 1 Abstract : v This report provides an (updated) overview of Grafalgo, an open- i Xsource library of graph algorithms and the data structures used to im- rplement them. The programs in this library were originally written to a supportagraduateclassinadvanceddatastructuresandalgorithmsat Washington University. Because the code’s primary purpose was ped- agogical, it was written to be as straightforward as possible, while still being highly efficient. Grafalgo is implemented in C++ and incorpo- rates some features of C++11. The library is available on an open-source basis and may be down- loaded from https://code.google.com/p/grafalgo/. Source code documentation is at www.arl.wustl.edu/~jst/doc/grafalgo. While notdesignedasproductioncode,thelibraryissuitableforuseinlarger systems, so long as its limitations are understood. The readability of the code also makes it relatively straightforward to extend it for other purposes. Grafalglo includes implementations of algorithms for a variety of classi- cal graph optimization problems. These include the minimum spanning tree problem, the shortest path problem, the max flow problem, the min-cost flow problem, the graph matching problem and the edge coloring problem 1 for bipartite graphs. Multiple algorithms are provided for each problem, il- lustratingavarietyofdifferentapproaches. Whileallthealgorithmsincluded hereareefficientenoughtobereasonablecandidatesinpracticalapplications, some are clearly better than others. Often the most sophisticated methods, with the best performance from a theoretical perspective, are not the best choice for real applications. Still, it’s instructive to study the techniques on which these algorithms are based, and the implementations provided here can aid in such study. This report does not attempt to describe the algo- rithms and data structures in detail. Readers may find more information in standard texts, including [1] and [5], as well as in the online documentation and source code. This report is organized mainly by problems. We start with a brief de- scription of a few of the basic data structures, then proceed to a discussion of the minimum spanning tree problem and the algorithms provided to solve it. Subsequent sections address different problems and the algorithms used to solve them. 1 Basic Data Structures TheGrafalgolibraryusesindex-baseddatastructures. Thesearedatastruc- tures in which the items of interest are represented by integers in a restricted range. For example, in a graph, we identify vertices by vertex numbers and edges by edge numbers. Index-based data structures have some advantages over data structures that use pointers (or object references) to identify the objects of interest. One key advantage is that the same integer index val- ues can be used in several related data structures. For example, in some applications it’s useful to define several graphs on the same vertex set. If all graphs use the same vertex numbers, it easy to relate the vertices in the different graph objects. Using pointer-based data structures, we would need some other explicit mechanism to relate the vertex objects in one graph to the corresponding vertex objects in another. This can certainly be done, but it’s cumbersome when compared to the use of shared index values. The use of indexes also makes it easy to associate properties of interest with the vertices or edges of a graph. These can simply be stored in separate tables, 2 indexed by the same vertex and edge numbers used in the graph. Index-based data structures also allow some operations to be imple- mented more efficiently than they can be using the equivalent pointer-based data structures. Let’s illustrate this with a simple index-based list, defined on the index set 1,...,n. Such a list can represent any subset of the index set, with the indexes arranged in any order. It is implemented by a simple array called next. For any index x in the list, next[x] is the next index in the list following x, or 0 if x is the last index in the list. For any index x that is not currently in the list, we define next[x] = −1. Figure 1 shows how the list [7, 5, 3, 8, 2] is represented. Note that this representation allows us to 1 2 3 4 5 6 7 8 9 10 next –1 0 8 –1 3 –1 5 2 –1 –1 Figure 1: Index-based list implementaconstant-timemembershiptestforindex-valuesinthelist,while a conventional list representation requires linear time. To iterate through a list, we use the first and next methods. for (index x = alist.first(); x != 0; x = alist.next(x)) {..} We frequently use this data structure to represent a list of vertices in a graph. In this case, the index values in the list object correspond directly to the vertex numbers in the graph. Grafalgo implements this data structure as the List class. There is also a doubly-linked version, called List d. Now,onemightwellobjectthattheselistsarelimited,inthattheydonot allowvaluestoappearmorethanonceinagivenlist. Morever,thesizeofthe objectmustbeaslargeasitslargestindex, whichmaybeconsiderablylarger thanthenumberofitemsinthelist. Thesearecertainlyvalidpoints, andfor those situations where these drawbacks are significant, Grafalgo provides a more generic list data structure called List g. This is a template-based data structure, allowing one to construct lists of arbitrary items. Grafalgo also includes a data structure that represents a set of disjoint lists on the underlying index set. Each non-empty list has a distinguished 3 index called its identifier. The data structure is implemented using two arrays next and prev. For each index x, next[x] is the next index in the list containing x (or 0 if x is the last index in the list), while prev[x] is the previous index in the list containing x (or, the last index if x is the first). An example, representing the collection {[1,3,6],[2,7],[4],[5,10,12],[8],[9,11]} is shown in Figure 2. Note that every index in the index set belongs to 1 2 3 4 5 6 7 9 8 10 11 12 next 3 7 6 0 10 0 0 11 0 12 0 0 prev 6 7 1 4 12 3 2 11 8 5 9 10 Figure 2: Set of disjoint lists some list, although some of these lists are singletons. This data structure is implemented by the Dlists class. Now, let’s look at the representation of the Graph class used to represent undirected graphs. An example is shown in Figure 3. The diagram at the a 2 1 3 b 4 left right 7 8 1 a b 5 2 3 4 9 es 2 a c oint ba 28 c 10 5 6 6 12 e edg 345 dbc dae st endp dc 153 11 d 13 r 6 e d fi e 9 adjLists 2 3 4 5 6 7 8 9 10 11 12 13 next 7 0 0 10 0 4 3 12 0 6 0 11 prev 4 8 7 10 11 2 3 12 5 13 9 6 Figure 3: Graph data structure top left shows the five vertices in the graph (identified by letters rather than integer indexes) along with the edges (identified by edge numbers). We also 4 associate two edge endpoint numbers with each edge. Specifically, edge e has endpoints numbered 2e and 2e+1. The diagram includes these endpoint numbers and shows how they are distributed among a set of adjacency lists (forexample, thelistforvertexais[2, 7, 4], correspondingtothethreeedges 1, 3 and 2 that are incident to a). The adjacency lists are implemented by a Dlists object that partitions the endpoint numbers among the adjacency lists. A first endpoint array identifies the first endpoint for each vertex. Finally, we have an array that identifies the left and right endpoints of each edge. We can iterate through the vertices and edges of a graph as follows. for (vertex u = 1; u <= g.n(); u++) { for (edge e = g.firstAt(u); e != 0; e = g.nextAt(u,e) { ... } } The Graph class also includes methods for iterating through all the edges of the graph, and methods for creating and removing edges. All data structures in Grafalgo include a toString method that produces a printable representation of the data structure (for example, the string “[13 30 22]” is the string represention of a List object for indexes 13, 30 and 22). All the data structures in the library also define a stream output operator. So for example, cout << myGraph; converts the object myGraph to a string and sends it to the standard output stream. For small instances of a data structure, the toString method converts index values to lower-case letters. So for example, the list [1, 3, 4] is shown as the string “[a c d]”, and the graph in the earlier example is shown as 5 { [a: b c d] [b: a e] [c: a d] [d: a c e] [e: b d] } Here, each line represents the adjacency list of the vertex listed before the ‘:’ and the remaining vertices are its neighbors (neighbors are listed repeatedly when there are multiple edges joining the same vertices). Letters are used in place of numeric indexes for any data structure defined on an index set 1,...,n, where n ≤ 26. This makes small examples easier for human readers tounderstand,especiallyfordatastructuresthatincludeothernumericdata, such as edge weights or key values. For data structures using larger index sets, the numeric index values are used in the string representation. 2 Minimum Spanning Tree Problem The objective of the minimum spanning tree problem is to find a spanning tree of an edge-weighted graph that has the smallest total weight. So for example, the bold edges in Figure 4 represent a minimum spanning tree of the graph shown. a 9 c 2 7 e b 6 5 1 3 d 2 f Figure 4: Minimum spanning tree Grafalgo contains several algorithms that solve the minimum spanning tree problem (mst). All have two arguments, a weighted graph (Graph w) 6 and a list in which the result is returned. Specifically, on return, the list contains the edge numbers of the edges in the minimum spanning tree. Prim’s algorithm (mst p) sovles the minimum spanning tree problem in O(mlog n) time. For dense graphs, this is O(m), which is optimal. 2+(cid:98)m/n(cid:99) For very sparse graphs (m = Θ(n)), the running time is O(mlogn), which falls a little short of optimal. Prim’s algorithm uses a d-heap to guide the selection of edges to be included in the tree. The Heap d class represents a setofitems, eachwithanumerickey. TheHeap dobjectisusedtoguidethe selection of edges to be included in the tree, where the keys correspond to edge weights in the graph. The Heap d class is implemented as a template, allowing different key types to be used in different applications. AsecondversionofPrim’salgorithm(mst pf)thatusesaFibonacciheap in place of the d-heap is also provided. This leads to a worst-case running timeofO(m+nlogn)whichisbetterforsparsegraphs. However,inpractice, the relative simplicity of the d-heap data structure makes the first version faster under most conditions. Fibonacci heaps do have some nice features relative to d-heaps. In particular, the Fheap class represents a collection of disjint heaps that can be efficiently combined (called melding), something that cannot be done using d-heaps. It shares this property with other meld- able heaps. Kruskal’salgorithm(mst k)findsaminimumspanningtreeinO(mlogn) time. Its running time is determined by an initial step which sorts the edges by weight. If the edges happen to be pre-sorted or can be sorted using radix sort, then Kruskal’s algorithm runs in O(mα(m,n)) where α is a very slowly growing function (it is inversely related to Ackerman’s function). It builds the minimum spanning tree by scanning edges in order of their weight and includinganyedgethatdoesnotcreateacycleamongthetreeedgesselected so far. It uses a disjoint sets data structure (Dsets) to maintain a partition over the vertices in the graph. This is used to efficiently determine if an edge joins two vertices that are already connected by a path consisting of tree edges. (The disjoint sets data structure is often referred to as the union-find data structure.) The Cheriton-Tarjan (mst ct) algorithm runs in O(mloglogn) time. For very sparse graphs, this yields the best overall performance among the algo- 7 rithms included in Grafalgo, although the extra overhead of its data struc- tures prevents it from out-performing Prim’s algorithm in typical applica- tions. Like Kruskal’s algorithm, it uses a Dsets data structure to determine if two vertices are in a common subtree of the forest defined by the tree edges selected so far. It also uses a leftist heap data structure (Mheaps l) to represent the edges incident to each subtree in the current forest. This section of the library also includes a program called testMst that can be used to compute a minimum spanning tree on a given graph, using a specified algorithm. A separate program called randGraph can be used to generate random weighted graphs that serve as input to testMst. So for example, the command randGraph wgraph 6 8 1 9 1 0 produces the output { [a: e(9) f(9)] [b: c(7) d(2) e(8) f(9)] [c: b(7) f(9)] [d: b(2)] [e: a(9) b(8) f(7)] [f: a(9) b(9) c(9) e(7)] } The first argument to randGraph is the type of graph (possiblities include ugraph, bigraph, tree, wgraph, digraph, dag and flograph among others). The second and third arguments specify the number of vertices and edges in the graph. The next two specify the range of edge weights to be used. The next argument is the seed for the random number generator and a non-zero valueforthelastargumentspecifiesthatthevertexandedgenumbersshould be randomly scrambled (this can be useful in situations where the default numbering might be exploited by an algorithm to improve its performance). The command randGraph wgraph 6 8 1 9 1 0 | testMst kruskal show verify produces the output 8 mst weight: 33 { [a: e(9) f(9)] [b: c(7) d(2) e(8) f(9)] [c: b(7) f(9)] [d: b(2)] [e: a(9) b(8) f(7)] [f: a(9) b(9) c(9) e(7)] } (b,d,2) (e,f,7) (b,c,7) (b,e,8) (a,f,9) The list of edges at the bottom defines the minimum spanning tree. The first argument to testMst specifies the algorithm to use, the optional second argument requests that the graph and spanning tree be output (if omitted, only the mst weight is output), the optional third argument requests that the spanning tree be checked for validity. Using these two programs, one can write simple scripts that test a given algorithmonawidevarietyofsamplegraphs,andautomaticallycheckthere- sults for correctness. Another program called timeMst can be used to obtain basic timing measurements of a specified algorithm, when run repeatedly on different random graphs. 3 Shortest Paths The shortest path problem involves determining minimum length paths in a directed graph with numeric edge lengths. There are several variants of the problem. Grafalgo includes algorithms for the single-source version of the problem and the all pairs version. Two algorithms are implemented for the single-source problem, Dijk- stra’s algorithm (spt d) and the Bellman-Moore algorithm (spt bm). Dijk- stra’s algorithm is implemented using a d-heap and has a running time of O(mlog n), but is restricted to graphs with non-negative edge lengths. 2+m/n TheBellman-Moorealgorithmcanhandlegraphswithnegativeedgelengths, 9 it requires only a simple queue and and runs in O(mn) time. Both take four arguments, a weighted directed graph object (Graph wd), a source vertex and two arrays used to return the results of the computation. The parent- edge array specifies the edge connecting each vertex to its parent in the shortest path tree, while the distance array specifies the shortest path dis- tance from the source. Grafalgoalsoincludestwoalgorithmsfortheall-pairsversionoftheprob- lem, Floyd’s algorithm (apsp f) and the Edmonds-Karp algorithm (apsp ek), both of which can handle negative edge lengths. Floyd’s algorithm runs in O(n3) time, while the Edmonds-Karp algorithm runs in O(mnlog n) 2+m/n time. Both return a 2-d array of distances, plus a second array that defines the actual shortest paths. There are also several utilities: testSpt, testApsp, timeSpt and timeApsp that can be used to demonstrate correct operation and generate timing in- formation. For example, the command randgraph wdigraph 6 15 1 9 5 0 | testSpt dijkstra show verify produces the output distance sum is 25 { [a: b(5) c(2) d(4)] [b: f(8)] [c: b(7) d(5) e(1)] [d: b(1) c(2)] [e: a(1) c(7) d(6) f(8)] [f: b(3) d(1)] } 0 5 2 4 3 11 (a,b,5) (a,c,2) (a,d,4) (c,e,1) (e,f,8) where the last line lists the edges in a shortest path tree with source vertex a, while the preceding line gives the distance of the vertices from a. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.