Simple Algorithm Prototype

This is an old revision of the document!

While cycle detection is a fundamental algorithm in Graph Theory neither the WebGraph or Graph Databases include it as a built in feature.

For the Graph Databases there are libraries that implement cycle detection and the documentation includes examples to show how it can be achieved. The documented approaches discover duplicate cycles that then have to be filtered out.

Using the CLIPS rule engine it is possible to implement a simple cycle detection algorithm that can be clearly mapped to the mathematical description of a cycle. It performs acceptably without optimization and supports incremental detection with minimum effort.

The goal of this prototype is to implement a simple algorithm for detecting simple cycles within a graph.

Let G = (V,E) be a directed graph (G) consists of a finite set of vertices (V) and a finite set of edges (E). Each edge describes a relationship v₁ → v₂ where v₁ and v₂ are distinct members of V.

Within G a cycle of length l is a sequence of vertices [v₀,v₁,…,v_l] where v_i]-1 → _i] for i = 1, 2, …., l-1 and v_l → v₁.

The cycle is simple when each vertex in [v₀,v₁,…,v_l] are distinct, so that v_i ≠ v_j for i ≠ j.

The Web Graph library does not include an algorithm for Cycle Detection. However, within the scratch package for the Java source code there is a DynamicDAG (Directed Acrylic Graph) implementation that uses topological ordering based on an algorithm for Incremental Cycle Detection.¹⁾ ²⁾. This appears to be a prototype.

Cycle Detection within Neo4J

The Neo4J library provides two methods for detecting cycles: cyphter queries and the APOC library.

Cypther is a declarative query language supported by neo4j. ³⁾.

The following code was provided on Stack Overflow ⁴⁾ as an example where l=15:

MATCH p=(n)-[*1..15]->(n) RETURN nodes(p)

However, as noted in the stack overflow discussion, this simple example has many limitations:

The query is expensive for larger graphs, with the cost growing exponentially as the graph size and l are increased.
It will return cycles consisting of a single vertex. To avoid this a minimum depth of 2 is required.
It detects cycles that include duplicate vertices. A more complex query is necessary to eliminate duplicates for the detection of simple graphs.

For detecting cycles in a more convenient and better performing way the APOC library provides the apoc.nodes.cycles procedure for detecting all path cycles from a node list. ⁵⁾

The signature for the procedure is as follows:

apoc.nodes.cycles(nodes :: LIST? OF NODE?, config = {} :: MAP?) :: (path :: PATH?)

The documentation provides the following example:

CREATE (m1:Start {bar: 'alpha'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 0}]->(m2:Module {bar: 'one'})-[:DEPENDS_ON {id: 1}]->(m3:Module {bar: 'two'})-[:DEPENDS_ON {id: 2}]->(m1)  WITH m1, m2, m3 CREATE (m1)-[:DEPENDS_ON {id: 3}]->(m2), (m2)-[:ANOTHER {id: 4}]->(m3), (m2)-[:DEPENDS_ON {id: 5}]->(m3) CREATE (m1)-[:DEPENDS_ON {id: 6}]->(:Module {bar: 'seven'})-[:DEPENDS_ON {id: 7}]->(:Module {bar: 'eight'})-[:DEPENDS_ON {id: 8}]->(m1);
CREATE (m1:Start {bar: 'beta'}) with m1 CREATE (m1)-[:MY_REL {id: 9}]->(m2:Module {bar: 'three'})-[:MY_REL  {id: 10}]->(m3:Module {bar: 'four'})-[:MY_REL {id: 11}]->(m1);
CREATE (m1:Start {bar: 'gamma'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 12}]->(m2:Module {bar: 'five'})-[:DEPENDS_ON {id: 13}]->(m3:Module {bar: 'six'});
CREATE (m1:Start {bar: 'delta'}) with m1 CREATE (m1)-[:DEPENDS_ON {id: 20}]->(m1);
CREATE (m1:Start {bar: 'epsilon'}) with m1 CREATE (m1)-[:DEPTH_ONE {id: 30}]->(:Module {bar: 'seven'})-[:DEPTH_ONE {id: 31}]->(m1);

MATCH (m1:Start) WITH collect(m1) as nodes CALL apoc.nodes.cycles(nodes) YIELD path RETURN path

Cycle Detection within Tinkerpop

The Apache Tinkerpop documentation provides a recipe for Cycle Detection ⁶⁾.

The following code detects cycles of arbitrary length.

g.V().as('a').repeat(both().simplePath()).emit(loops().is(gt(1))).
  both().where(eq('a')).path().
  dedup().by(unfold().order().by(id).dedup().fold())

This code takes advantage of the the following Gremlin steps:

simplePath() step to ensure that no vertex is repeated within the cycle ⁷⁾.
loops() step to count the number of times a loop is repeated ⁸⁾.
dedup() step to remove repeated objects ⁹⁾.

These loops() ad dedup() steps indicate inefficiencies in the recipe. The query tests for loops greater than 1. This means that a cycle is detected once it has been repeated twice, rather than when the first repeated node is detected. The use of deduplication means that redundant cycles are discovered and then eliminated.

* Golumbic, M. C. (2004). Algorithmic Graph Theory and Perfect Graphs (2nd ed., Chapter 1 - Graph Theoretic Foundations). North Holland Publishing Company, a Subsidiary of Eslevier.

¹⁾

https://github.com/vigna/webgraph/blob/master/src/it/unimi/dsi/webgraph/scratch/DynamicDAG.java#L9

²⁾

Haeupler, Bernhard, et al. (2012) “Incremental cycle detection, topological ordering, and strong component maintenance.” ACM Transactions on Algorithms

³⁾

https://neo4j.com/docs/cypher-manual/current/introduction/

⁴⁾

https://stackoverflow.com/questions/39196706/find-all-simple-cycles-through-a-given-node-in-neo4j

⁵⁾

https://neo4j.com/labs/apoc/4.3/overview/apoc.nodes/apoc.nodes.cycles/

⁶⁾

https://tinkerpop.apache.org/docs/current/recipes/#cycle-detection

⁷⁾

https://tinkerpop.apache.org/docs/3.6.4/reference/#simplepath-step

⁸⁾

https://tinkerpop.apache.org/docs/3.6.4/reference/#loops-step