Distributed Datalog Materialisation with Dynamic Data Exchange
Temitope Ajileye and Boris Motik and Ian Horrocks
Department of Computer Science, University of Oxford, Oxford, United Kingdom.
Paper
The executables and test files published here refer to the paper written by the aforementioned authors and available
here.
Executable
The executable was compiled from C++ source code. Download the
Windows or the
Linux versions. To run a script with DMAT, use the command:
./DMAT.exe -shell . <script file name>
Folder Structure
<partition root>
- DMAT
- [datalog programs]
-
scripts/
-
facts/
- output/
Partitions Generator
To partition the data (and compile the occurrences), run
./DMAT.exe -shell
Then, in interactive mode, run
partition <number of partitions> <source1> [<source2>...]
This will use graph partitioning. For hash partitioning
partition hash <number of partitions> <source1> [<source2>...]
The output will be a set of files named source1.p0.etl.gz, source1.p1.etl.gz, .. and an occurrence file source1.ocr.gz or source1.ocr, depending on whether the input was compressed. Move the part files into the fact folder of the appropriate partition root, together with a copy of the occurrence file.
Script Generator
For now we reccomend first preparing all partition root folders on a single server, then moving them to the appropriate servers. For networks with more than a server, we also reccomend creating an aditional node to serve as terminal (it will load the entire dictionary and import the program). The script testgen.py supports this workflow and copying via scp.
- Place the python script alongside the partition root folders, named <prefix>-0, <prefix>-1, ...., <prefix>-n. The last one will be for the terminal node.
-
In the same directory create a hosts.txt file with a single server address specification on each line.
- <server0-name> <host-name/IP-address>
- <server1-name> <host-name/IP-address>
- ...
-
In the same directory create a settings.txt file with the following values in each line
- <prefix>
- <dataset> (without .p0.etl.gz)
- <ruleset> (without .dlog)
- <number of servers>
- <number of threads per server> (mutithreading is still unstable so we reccomend setting 1)
-
Run python testgen.py . This will populate all script folders with appropriate scripts and create two three additional sets of script:
- A transfer.sh executable to copy all partition folders to the target hosts
- A transferDMAT.sh executable to copy the DMAT executable to the target hosts.
- A test<numthreads> executable in each partition root. You can use this to run DMAT when on the hosting server
- Run a, b, and c in each host to run a test.
Test Files
We have included two example rulesets, LUBM_L (lower bound) and LUBM_LE (lowebound extended with additional transitivity rules).