Graph Processing with the Rete Algorithm

This is an old revision of the document!

I believe that the Rete Algorithm is a powerful innovation for graph processing that has been overlooked because it is seen as a component of Expert Systesm are not as a innovation in it's own right. The history of innovation provides examples of innovations that are made in service of a larger goal that can potentially fail. The adhesive for Post Notes was part of a failed product for over a decade before being repurposed into a successful product. (Edgerton, D. 2010) My hypothesis is that the RETE algorithm is a similar innovation.

I believe that the adaoption of the RETE algorithms in new contexts will bring the following potential benefits:

Language Alignment. The RETE approach supports the creation of declarative languages that are more closely aligned to the problem domain than iterative languages.
Continuous Matching. The RETE has proven itself as effective at continuous pattern matching, evaluating changes in state quickly and efficiently.
Parallel Execution. By removing the need for traversal the RETE approach removes a barrier to parallel execution.
Execution Plan. The RETE approach creates a network for evaluating state changes and executing actions when specific conditions occur. This network can be examined and optimized.
Runtime Optimization. The dynamic nature of the RETE approach supports the adjustment of the generated network at runtime to maintain optimal performance within a changing environment.

In order to test this hypothesis, I will extract the RETE algorithm from it's context of production systems by identifying and isolating the problem it was designed to solve within that context. I will then identify a different context with the same problem, and attempt to apply the algorithm to solve that problem within the new context.

The context of The Rete Algorithm was made celar when it was introduced in the paper “A network match routine for production systems” by Chales L Forgy. ¹⁾ Forgy went on to develop the OPS5 production system based on the algorithm. ²⁾

Production systems process a set of productions. A production is an if-then statement that can be evaluated to true or false. The productions are defined in if-then statements that provide the if condition on the left hand side (LHS) and the action to then be taken on the right hand side (RHS). These productions are applied to a set of facts held in a global database referred to as working memory. The engine executes the productions by following the process:

Match. The LHS are evaluated to see if their conditions are true given the current state of the working memory.
Conflict Resolution. Since RHS actions must be executed sequentially it it necessary to decide the order of execution when more than one LHS evaluates as true.
Act. Execute each of the RHS actions.

Production Systems had a problem: execution speed. Forgey explains:

“Since execution speed has always been a major issue for production systems, several researchers have worked on the problem of efficiency. The most

common approach has been to combine a process called indexing with direct interpretation of the LHSs. In the simplest form of indexing, the interpreter begins the match process by extracting one or more features from each working memory element, and uses those features to hash into the collection of productions.” ³⁾

In order to improve efficiency, it is necessary to avoid traversing over the facts working memory. The accepted solution for the performance problem at the time of writing was an indexing approach that would create hashing functions based on the LHS conditions. When a new fact is introduced to working memory hashing functions are applied to quickly identify and index facts that meet certain conditions. These indexes can then be used to reduce the number of facts that need to be traversed in order to evaluate the LHS conditions fully.

Forgey's innovation was to represent the indexing function as a network of simple feature regonizers. Processing the changes to working memory through this network entirely removes the need to iterate over and evaluate facts.

In summary:

Context: Production Systems
Problem: Poor performance due to traversing over many items
Solution: Use a network of simple feature recognizers to avoid traversal

The next step is to identify a new context where the same problem is evident. Graph Processing is a candidate, with the problem of Graph Compression providing a specific problem for applying the RETE approach and assessing it's applicability within this new domain.

Graph compression’s focus is on reducing the storage space required to store the graph. This is achieved by compressing space adjacent edges using efficient encoding methods such as bitmap compression, reference encoding, delta encoding or word aligned hybrid. An example of this approach can be seen in the Webgraph framework which uses reference encoding for compression. ⁴⁾ The Webgraph framework is recognised as state of the art in the field of graph compression. ⁵⁾

The Partition and Code (PnC) framework ⁶⁾ provides a framework for systematically applying the RETE approach to the problem of graph compression. It provides a standard model consisting of the following steps:

Partitioning – apply an algorithm to decompose each graph into subgraphs.
Graph Dictionary – Translate the subgraphs into a mapping of connected subgraphs.
Graph Encoding – Compress the graph into an encoded output.

The first two steps of Partitioning and Graph Dictionary corresponding to the indexing stage that was commonly used for production systems. I can be described as follows:

Iterate over all vertices in the graph and identify grouping into subgraphs using patterns such as neighbourhoods, centralities, connected components and hyperball.
Hash the subgraphs and store them within collections.
Iterate over those collections to produce an efficient encoded output.

The RETE algorithm would instead adopt a network of simple pattern matchers for identifying subgraphs. The RHS actions for these functions would maintain an internal model of the graph as a set of subgraphs. Additional nodes then match patterns within this model in order to maintain an efficient encoded representation of the graph.

For evaluation of the potential benefits of Langauge Alignment and Continous Matching prototypes will be created using the CLIPS rules engine. It is believed that the CLIPS langauge will allow for these benefits to be demonstrated without the need for chaning it's implementation.

This analysis will be applied over the following prototypes:

Simple Algorithm. A well understood algorithm for idenitfying elementary cycles is prototyped to assess the value of the RETE approach in the graph processing context.
Existing Algorithm. An existing method for graph compression is prototyped to assess the value of the RETE approach in the graph compression context.
New Algorithm. A novel compression method for graph compression is prototyped to assess the value of the RETE approach for the development of new graph algorithms.
End to End Process. The algorithms developed will be implemented within a pipes and filters framework to assess the composability of the RETE approach.

For evaluating the potential benefits of Parallel Processing, Execution Plan and Runtime Optmisation research will be carried out. It is believed that the CLIPS runtime would need to be updated in order to demonstrate these benefits. Updates to the CLIPS runtime is outside the scope of this project.

The research will identify existing solutions that demonstrate these benefits, and provide an overview of the porgress made. Recommendations will be made for how these benefits might be achieved within a RETE based graph processing solution.

The following diagram provides an overview of the project's strucutre, including the 5 potentional benefits and the 2 methods being taken to explore those benefits.

⁷⁾

Boldi, P. and Vigna, S. (2004) “The Webgraph Framework I,” Proceedings of the 13th international conference on World Wide Web [Preprint]. Available at: https://doi.org/10.1145/988672.988752.
Bouritsas, G. et al. (2021) Partition and code: Learning how to compress graphs, arXiv.org. Available at: https://arxiv.org/abs/2107.01952 (Accessed: March 10, 2023).
Claude, F. and Navarro, G. (2010) “Fast and compact web graph representations,” ACM Transactions on the Web, 4(4), pp. 1–31. Available at: https://doi.org/10.1145/1841909.1841913.
Edgerton, D. (2010). Innovation, technology, or history: What is the historiography of technology about. Technology and Culture, 51(3), 680-697.
Forgy, C.L. (1974) “A network match routine for production systems.,” Working Paper
Forgy, C.L. (1982) “Rete: A fast algorithm for the many pattern/many object pattern match problem,” Artificial Intelligence, 19(1), pp. 17–37. Available at: https://doi.org/10.1016/0004-3702(82)90020-0.