This is an old revision of the document!
New Algorithm Prototype
Introduction
For this prototype a new algorithm for compression is introduced
The Algorithm
Graph Representation
The algorithm represents a directed graph as a set of bitmaps. The use of bitmap encoding has two key benefits:
- Bitmaps can encode information very efficiently. For example, a K64 complete graph with 64 vertexes and 2016 edges could be stored in just 64 words on a 64-bit architecture.
- Binary logic operations can be performed very efficiently due to there support at the processor level.
The algorithm is described as follows:
- Let BD be the dimension of a single bitmap. This is usually based on the underlying platform. For example, for a 32-bit architecture it will be 32 and for a 64-bit architecture it will be 64. It is possible to use datatype to support larger values for N but for the best performance BD should be aligned with the underlying system architecture.
- Let BM be a bitmap encoded as BD words.
- Let i be a position between 1 and BD1
- Let Vi be a Vertex allocated to position i. A vertex can be allocated to 0 or 1 Vertices.
- Let each column in the matrix represent a from-vertex.
- Let each row in the matrix represent a to-vertex.
- For each edge that exists within a graph the bit at the intersection between the from-vertex and the to-vertex is set to 1.
- For any bit position set to zero no edge exists between the from-vertex and the to-vertex.
For example, consider a graph of 4 vertexes and 6 edges:
- V = ( 1, 2, 3, 4 )
- E = ( 1→2, 2→3, 3→4, 4→1 )
This graph is represented by 4 words, with each word having a length of 4 bits. The following table shows the bitmap used to encode this graph:
| From 1 | From 2 | From 3 | From 4 | |
|---|---|---|---|---|
| To 1 | 0 | 0 | 0 | 1 |
| To 2 | 1 | 0 | 0 | 0 |
| To 3 | 0 | 1 | 0 | 0 |
| To 4 | 0 | 0 | 1 | 0 |
This would be an array of bytes: [ 00012, 10002, 01002, 00102 ] = [810, 110, 210, 410].
This graph could be repreatened within a single 64bit long. Using the traditional method of representing each VId as a long and and edge as 2 VIds the graph would require 12 64bit longs for this specific graph. An empty graph with 4 vertices would require 4 longs and a complete graph with 4 vertices and 6 edges would require 16 longs.
Originality
A taxomomy of lossless graph compression representations lists just one bitmap based scheme: the DEX general-purpose system for large graphs. 2).
The DEX system is described in two papers, which provide a high level description of the way in which bitmap indices are used. They are one of data structures used alongside other map types, links and various auxiliary structures. 3) The graph is represented as a set of object identifiers (oids) which may identify an vertex (vid) or edge (eid). Within a bitmap each position is allocated to an oid. If the bit is set for a position, then the corresponding object identified by the oid is included in the set. 4)
This is different from the proposed encoding. The propose encoding is two dimensional with positions represeting only vertices. An edge is represented by a bit value being set at the intersection between the from-vertex and the to-vertex.
A library search for the terms “Bitmap”, “Graph” and Compression“, with the term “Image” excluded, yielded one relevant paper regarding the BitTrain, a proposed technique for training deep learning models with a low memory footprint that allows the network to learn continuously after deployment. 5) This approach creates a similar two-dimensional binary map but the setting of a bit represents a non-zero value and not a non-zero value. However, if we consider that to be a graph of all non-zero edges within a neurual network then the two representations are similar. It is possible that the proposed algorithm is a generalisation of the bitmap representation used within BitTrain. More detail of the BitTrain method is required to verify.
References
- Abdelrahman Hosny, Marina Neseem, and Sherief Reda. “BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge.” arXiv.org (2021): n. pag. Web.
- Besta, Maciej, and Torsten Hoefler. “Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.” (2018): n. pag. Web.
- Martinez-Bazan, N, S Gomez-Villamor, and F Escale-Claveras. “DEX: A High-Performance Graph Database Management System.” 2011 IEEE 27th International Conference on Data Engineering Workshops. IEEE, 2011. 124–127. Web.
- Martínez-Bazan, Norbert et al. “Efficient Graph Management Based on Bitmap Indices.” Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012. 110–119. Web.