Graph Algorithms in Ruby

SitePoint_Articles · May 23, 2015, 12:01am

An excerpt from http://www.sitepoint.com/bring-devops-spirit-non-engineers/, by Dhaivat Pandya

A lot (read: most) of Rubyists are focused on one aspect of software engineering: web development. This isn’t necessarily a bad thing. The web is growing at an incredible rate and is definitely a rewarding (monetarily and otherwise) field in which to have expertise. However, this does not mean that Ruby is good just for web development.

The standard repertoire of algorithms is pretty fundamental to computer science and having a bit of experience with them can be incredibly beneficial. In this article, we’ll go through some of the most basic graph algorithms: depth first search and breadth first search. We’ll look at the ideas behind them, where they fit in with respect to applications, and their implementations in Ruby.

Terminology

Before we can really get going with the algorithms themselves, we need to know a tiny bit about graph theory. If you’ve had graph theory in any form before, you can safely skip this section. Basically, a graph is a group of nodes with some edges between them (e.g. nodes representing people and edges representing relationships between the people). What makes graph theory special is that we don’t particularly care about the Euclidean geometrical structure of the nodes and edges. In other words, we don’t care about the angles they form. Instead, we care about the “relationships” that these edges create. This stuff is a bit hand-wavy at the moment, but it’ll become clear as soon as we look at some concrete examples:

Alright, so there: we have a graph. But, what if we want a structure that can represent the idea that “A is related to B but B isn’t related to A”? We can have directed edges in the graph.

Now, there is a direction to go with each relationship. Of course, we can create a directed graph out of an undirected graph by replacing each undirected edge with two directed edges going opposite ways.

The Fundamental Problem

Say we’re given a given a directed graph (G) and two nodes: (S) and (T) (usually referred to as the source and terminal). We want to figure out whether there is a path between (S) (T). Can we can get to (T) by the following the edges (in the right direction) from (S) to (T)? We’re also interested in what nodes would be traversed in order to complete this path.

There are two different solutions to this problem: depth first search and breadth first search. Given the names and a little bit of imagination, it’s easy to guess the difference between these two algorithms.

The Adjacency Matrix

Before we get into the details of each algorithm, let’s take a look how we can represent a graph. The simplest way to store a graph is probably the adjacency matrix. Let’s say we have a graph with (V) nodes. We represent that graph with a (V x V) matrix full of 1’s and 0’s. If there exists an edge going from node [i] to node [j], then we place a (1) in row (i) and row (j). If there’s no such edge, then we place a (0) in row (i) and row (j). An adjacency list is another way to represent a graph. For each node (i), we setup lists that contain references to the nodes that (i) has an edge to.

What’s the difference between these two approaches? Say we have a graph with 1000 nodes but only one edge. With the adjacency matrix representation, we’d have to allocate a (1000*1000) element array in order to represent the graph. On the other hand, a good adjacency list representation would not need to represent outputs for all for the nodes. However, adjacency lists have their own downsides. With the traditional implementation of linked lists, it takes linear time in order to check if a given edge exists. For example, if we want to check if edge (4, 6) exists, then we have to look at the list of outputs of 4 and then loop over it (and it might contain all the nodes in the graph) to check if 6 is part of that list.

On the other hand, checking if row 4 and column 6 is 1 in a adjacency matrix takes a constant amount of time regardless of the structure of the graph. So, if you have a sparse graph (i.e. lots of nodes, few edges), use an adjacency list. If you have a dense graph and are doing lots of existence checking of edges, use an adjacency matrix.

For the rest of the article, we’ll be using the adjacency matrix representation mostly because it is slightly simpler to reason about.

Continue reading this article on SitePoint!

ralphm · August 22, 2015, 7:08am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.