This is an old revision of the document!
Simple Algorithm Prototype
Overview
While cycle detection is a fundamental algorithm in Graph Theory neither the WebGraph or Graph Databases include it as a built in feature.
For the Graph Databases there are libraries that implement cycle detection and the documentation includes examples to show how it can be achieved. The documented approaches discover duplicate cycles that then have to be filtered out.
Using the CLIPS rule engine it is possible to implement a simple cycle detection algorithm that can be clearly mapped to the mathematical description of a cycle. It performs acceptably without optimization and supports incremental detection with minimum effort.
Simple Cycle Detection
Defining a Simple Cycle
The goal of this prototype is to implement a simple algorithm for detecting simple cycles within a graph.
Let G = (V,E) be a directed graph (G) consists of a finite set of vertices (V) and a finite set of edges (E). Each edge describes a relationship v1 → v2 where v1 and v2 are distinct members of V.
Within G a cycle of length l is a sequence of vertices [v0,v1,…,vl] where vi]-1 → i] for i = 1, 2, …., l-1 and vl → v1.
The cycle is simple when each vertex in [v0,v1,…,vl] are distinct, so that vi ≠ vj for i ≠ j.
Simple Cycle Detection using standard algorithms
Simple Cycle Detection within the Web Graph library
The Web Graph library does not include an algorithm for Cycle Detection. However, within the scratch package for the Java source code there is a DynamicDAG (Directed Acrylic Graph) implementation that uses topological ordering based on an algorithm for Incremental Cycle Detection.1) 2). This appears to be a prototype.
Simple Cycle Detection using a Graph Database
Cycle Detection within Neo4J
The Neo4J library provides two methods for detecting cycles: cyphter queries and the APOC library.
Cypther is a declarative query language supported by neo4j. 3).
The following code was provided on Stack Overflow 4) as an example where l=15:
MATCH p=(n)-[*1..15]->(n) RETURN nodes(p)
However, as noted in the stack overflow discussion, this simple example has many limitations:
- The query is expensive for larger graphs, with the cost growing exponentially as the graph size and l are increased.
- It will return cycles consisting of a single vertex. To avoid this a minimum depth of 2 is required.
- It detects cycles that include duplicate vertices. A more complex query is necessary to eliminate duplicates for the detection of simple graphs.
For detecting cycles in a more convenient and better performing way the APOC library provides the apoc.nodes.cycles procedure for detecting all path cycles from a node list. 5)
The signature for the procedure is as follows:
apoc.nodes.cycles(nodes :: LIST? OF NODE?, config = {} :: MAP?) :: (path :: PATH?)
The documentation provides the following example:
CREATE (m1:Start {bar: 'alpha'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 0}]->(m2:Module {bar: 'one'})-[:DEPENDS_ON {id: 1}]->(m3:Module {bar: 'two'})-[:DEPENDS_ON {id: 2}]->(m1) WITH m1, m2, m3 CREATE (m1)-[:DEPENDS_ON {id: 3}]->(m2), (m2)-[:ANOTHER {id: 4}]->(m3), (m2)-[:DEPENDS_ON {id: 5}]->(m3) CREATE (m1)-[:DEPENDS_ON {id: 6}]->(:Module {bar: 'seven'})-[:DEPENDS_ON {id: 7}]->(:Module {bar: 'eight'})-[:DEPENDS_ON {id: 8}]->(m1);
CREATE (m1:Start {bar: 'beta'}) with m1 CREATE (m1)-[:MY_REL {id: 9}]->(m2:Module {bar: 'three'})-[:MY_REL {id: 10}]->(m3:Module {bar: 'four'})-[:MY_REL {id: 11}]->(m1);
CREATE (m1:Start {bar: 'gamma'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 12}]->(m2:Module {bar: 'five'})-[:DEPENDS_ON {id: 13}]->(m3:Module {bar: 'six'});
CREATE (m1:Start {bar: 'delta'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 20}]->(m1);
CREATE (m1:Start {bar: 'epsilon'}) with m1 CREATE (m1)-[:DEPTH_ONE {id: 30}]->(:Module {bar: 'seven'})-[:DEPTH_ONE {id: 31}]->(m1);
MATCH (m1:Start) WITH collect(m1) as nodes CALL apoc.nodes.cycles(nodes) YIELD path RETURN path
Cycle Detection within Tinkerpop
The Apache Tinkerpop documentation provides a recipe for Cycle Detection 6).
The following code detects cycles of arbitrary length.
g.V().as('a').repeat(both().simplePath()).emit(loops().is(gt(1))).
both().where(eq('a')).path().
dedup().by(unfold().order().by(id).dedup().fold())
This code takes advantage of the the following Gremlin steps:
- simplePath() step to ensure that no vertex is repeated within the cycle 7).
- loops() step to count the number of times a loop is repeated 8).
- dedup() step to remove repeated objects 9).
These loops() ad dedup() steps indicate inefficiencies in the recipe. The query tests for loops greater than 1. This means that a cycle is detected once it has been repeated twice, rather than when the first repeated node is detected. The use of deduplication means that redundant cycles are discovered and then eliminated.
References
* Golumbic, M. C. (2004). Algorithmic Graph Theory and Perfect Graphs (2nd ed., Chapter 1 - Graph Theoretic Foundations). North Holland Publishing Company, a Subsidiary of Eslevier.