Differences

This shows you the differences between two versions of the page.

--- retegraphprocessing:elements:prototyping:new_algorithm [2023/05/15 10:01] – [Description] gedbadmin
+++ retegraphprocessing:elements:prototyping:new_algorithm [2024/01/19 17:46] (current) – removed gedbadmin
@@ Line 1: / Line 1: @@
-====== New Algorithm Prototype ======
-===== Introduction =====
-For this prototype a new algorithm for compression is introduced
-===== The Algorithm =====
-==== Graph Representation ====
-The algorithm represents a directed graph as a set of bitmaps.  The use of bitmap encoding has two key benefits:
-  - Bitmaps can encode information very efficiently.  For example, a K64 complete graph with 64 vertexes and 2016 edges could be stored in just 64 words on a 64-bit architecture.
-  - Binary logic operations can be performed very efficiently due to there support at the processor level.
-((Martínez-Bazan, Norbert et al. 2012))
-The algorithm is described as follows:
-  - Let BD be the dimension of a single bitmap.  This is usually based on the underlying platform.  For example, for a 32-bit architecture it will be 32 and for a 64-bit architecture it will be 64.  It is possible to use datatype to support larger values for N but for the best performance BD should be aligned with the underlying system architecture.
-  - Let BM be a bitmap encoded as BD words.
-  - Let i be a position between 1 and BD1
-  - Let V<sub>i</sub> be a Vertex allocated to position i.  A vertex can be allocated to 0 or 1 Vertices.
-  - Let each column in the matrix represent a from-vertex.
-  - Let each row in the matrix represent a to-vertex.
-  - For each edge that exists within a graph the bit at the intersection between the from-vertex and the to-vertex is set to 1.
-  - For any bit position set to zero no edge exists between the from-vertex and the to-vertex.
-For example, consider a graph of 4 vertexes and 6 edges:
-  * V = ( 1, 2, 3, 4 )
-  * E = ( 1->2, 2->3, 3->4, 4->1 )
-This graph is represented by 4 words, with each word having a length of 4 bits.  The following table shows the bitmap used to encode this graph:
-|       ^ From 1 ^ From 2 ^ From 3 ^ From 4 ^
-^ To 1  |   0    |   0    |   0    |   1    |
-^ To 2  |   1    |   0    |   0    |   0    |
-^ To 3  |   0    |   1    |   0    |   0    |
-^ To 4  |   0    |   0    |   1    |   0    |
-This would be an array of bytes: [ 0001<sub>2</sub>, 1000<sub>2</sub>, 0100<sub>2</sub>, 0010<sub>2</sub> ] = [8<sub>10</sub>, 1<sub>10</sub>, 2<sub>10</sub>, 4<sub>10</sub>].
-This graph could be repreatened within a single 64bit long.  Using the traditional method of representing each VId as a long and and edge as 2 VIds the graph would require 12 64bit longs for this specific graph.  An empty graph with 4 vertices would require 4 longs and a complete graph with 4 vertices and 6 edges would require 16 longs.
-==== Originality ====
-A taxomomy of lossless graph compression representations lists just one bitmap based scheme: the DEX general-purpose system for large graphs.  ((Besta, Maciej, and Torsten Hoefler. 2018)).
-The DEX system is described in two papers, which provide a high level description of the way in which bitmap indices are used.  They are one of data structures used alongside other map types, links and various auxiliary structures.  ((Martínez-Bazan, et al. 2011))  The graph is represented as a set of object identifiers (oids) which may identify an vertex (vid) or edge (eid).  Within a bitmap each position is allocated to an oid.  If the bit is set for a position, then the corresponding object identified by the oid is included in the set.  ((Martínez-Bazan, et al. 2012))
-This is different from the proposed encoding.  The propose encoding is two dimensional with positions represeting only vertices.  An edge is represented by a bit value being set at the intersection between the from-vertex and the to-vertex.
-A library search for the terms "Bitmap", "Graph" and Compression", with the term "Image" excluded, yielded one relevant paper regarding the BitTrain, a proposed technique for training deep learning models with a low memory footprint that allows the network to learn continuously after deployment.  ((Abdelrahman, etal. 2021))  This approach creates a similar two-dimensional binary map but the setting of a bit represents a non-zero value and not a non-zero value.  However, if we consider that to be a graph of all non-zero edges within a neurual network then the two representations are similar.  It is possible that the proposed algorithm is a generalisation of the bitmap representation used within BitTrain.  More detail of the BitTrain method is required to verify.
-====== References ======
-  * Abdelrahman Hosny, Marina Neseem, and Sherief Reda. “BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge.” arXiv.org (2021): n. pag. Web.
-  * Besta, Maciej, and Torsten Hoefler. “Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.” (2018): n. pag. Web.
-  * Martinez-Bazan, N, S Gomez-Villamor, and F Escale-Claveras. “DEX: A High-Performance Graph Database Management System.” 2011 IEEE 27th International Conference on Data Engineering Workshops. IEEE, 2011. 124–127. Web.
-  * Martínez-Bazan, Norbert et al. “Efficient Graph Management Based on Bitmap Indices.” Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012. 110–119. Web.