New Algorithm Prototype

This is an old revision of the document!

For this prototype a new algorithm for compression is introduced

The algorithm represents a directed graph as a set of bitmaps. The use of bitmap encoding has two key benefits:

Bitmaps can encode information very efficiently. For example, a K64 complete graph with 64 vertexes and 2016 edges could be stored in just 64 words on a 64-bit architecture.
Binary logic operations can be performed very efficiently due to there support at the processor level.

¹⁾

The algorithm is described as follows:

Let BD be the dimension of a single bitmap. This is usually based on the underlying platform. For example, for a 32-bit architecture it will be 32 and for a 64-bit architecture it will be 64. It is possible to use datatype to support larger values for N but for the best performance BD should be aligned with the underlying system architecture.
Let BM be a bitmap encoded as BD words.
Let i be a position between 1 and BD1
Let V_i be a Vertex allocated to position i. A vertex can be allocated to 0 or 1 Vertices.
Let each column in the matrix represent a from-vertex.
Let each row in the matrix represent a to-vertex.
For each edge that exists within a graph the bit at the intersection between the from-vertex and the to-vertex is set to 1.
For any bit position set to zero no edge exists between the from-vertex and the to-vertex.

For example, consider a graph of 4 vertexes and 6 edges:

V = ( 1, 2, 3, 4 )
E = ( 1→2, 2→3, 3→4, 4→1 )

This graph is represented by 4 words, with each word having a length of 4 bits. The following table shows the bitmap used to encode this graph:

	From 1	From 2	From 3	From 4
To 1	0	0	0	1
To 2	1	0	0	0
To 3	0	1	0	0
To 4	0	0	1	0

This would be an array of bytes: [ 0001₂, 1000₂, 0100₂, 0010₂ ] = [8₁₀, 1₁₀, 2₁₀, 4₁₀].

This graph could be repreatened within a single 64bit long. Using the traditional method of representing each VId as a long and and edge as 2 VIds the graph would require 12 64bit longs for this specific graph. An empty graph with 4 vertices would require 4 longs and a complete graph with 4 vertices and 6 edges would require 16 longs.

A taxomomy of lossless graph compression representations lists just one bitmap based scheme: the DEX general-purpose system for large graphs. ²⁾.

The DEX system is described in two papers, which provide a high level description of the way in which bitmap indices are used. They are one of data structures used alongside other map types, links and various auxiliary structures. ³⁾ The graph is represented as a set of object identifiers (oids) which may identify an vertex (vid) or edge (eid). Within a bitmap each position is allocated to an oid. If the bit is set for a position, then the corresponding object identified by the oid is included in the set. ⁴⁾

This is different from the proposed encoding. The propose encoding is two dimensional with positions represeting only vertices. An edge is represented by a bit value being set at the intersection between the from-vertex and the to-vertex.

A library search for the terms “Bitmap”, “Graph” and Compression“, with the term “Image” excluded, yielded one relevant paper regarding the BitTrain, a proposed technique for training deep learning models with a low memory footprint that allows the network to learn continuously after deployment. ⁵⁾ This approach creates a similar two-dimensional binary map but the setting of a bit represents a non-zero value and not a non-zero value. However, if we consider that to be a graph of all non-zero edges within a neurual network then the two representations are similar. It is possible that the proposed algorithm is a generalisation of the bitmap representation used within BitTrain. More detail of the BitTrain method is required to verify.

Abdelrahman Hosny, Marina Neseem, and Sherief Reda. “BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge.” arXiv.org (2021): n. pag. Web.
Besta, Maciej, and Torsten Hoefler. “Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.” (2018): n. pag. Web.
Martinez-Bazan, N, S Gomez-Villamor, and F Escale-Claveras. “DEX: A High-Performance Graph Database Management System.” 2011 IEEE 27th International Conference on Data Engineering Workshops. IEEE, 2011. 124–127. Web.
Martínez-Bazan, Norbert et al. “Efficient Graph Management Based on Bitmap Indices.” Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012. 110–119. Web.

¹⁾

Martínez-Bazan, Norbert et al. 2012

²⁾

Besta, Maciej, and Torsten Hoefler. 2018

³⁾

Martínez-Bazan, et al. 2011

⁴⁾

Martínez-Bazan, et al. 2012

⁵⁾

Abdelrahman, etal. 2021