Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality: Code, Datasets, and Experimental Results

Efthymia Tsamoura and Boris Motik

This Web page contains the code, the datasets, and the results of the experiments described in the paper Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality that is currently under review at the Artificial Intelligence Journal. The paper describes a novel method that can answer a query over dependencies without computing the chase in full — that is, by aiming to derive only the consequences relevant to the query.

Code

The code of our prototype is included in the code.zip archive. The prototype implements a mini RAM-based database that can load datasets and dependencies in an extension of the ChaseBench format. The system was written in Java as a wrapper around the well-known RDFox system. RDFox is written in C++ and thus its executable needs to be adapted to the target platform. The code archive contains the lib/JRDFox.jar file, which contains a research version of RDFox that can be run on modern x86_64 systems running a recent version of the Linux operating system. We cannot provide RDFox releases for other platforms at the moment.

Test Scenarios

First-Order Scenarios

The first-order test scenarios are the same as in ChaseBench; however, we changed the syntax of equality atoms from t1 = t2 to =(t1,t2) in our work in order to simplify parsing. All first-order scenarios are given below in full.

Each test scenario consists of a base instance stored as one CSV file per relation without a header, a set of dependencies, and a set of queries. The dependencies they are usually organised by distinguishing source-to-target TGDs (stored in a file with a name of the form <file>.st-tgds.txt), target TGDs (stored in a file with a name of the form <file>.t-tgds.txt), and target EGDs (stored in a file with a name of the form <file>.t-egds.txt). Below is an example of a source-to-target TGD.

    v0(?X1,?X2,?X7,?X8) -> m87004(?X1,?X2,?X7,?X8), m298004(?X2,?X3,?X9,?X10), m113004(?X3,?X4,?X11,?X12) .

In addition, each test scenario comes with a set of queries, each stored in a separate file. Below is an example of a query.

    q01(?m63004_c0) <- m63004(?m63004_c0,?m63004_c1,?m63004_c2,?m63004_c3), m197004(?m63004_c0,?m197004_c1,?m197004_c2,?m197004_c3) .

Second-Order Scenarios

The second-order scenarios listed below were generated using the algorithm presented in Section 5 of our paper. The syntax used is the same as for first-order scenarios. However, each scenario contains a file with a name of the form <file>.st-tgds.txt that contains all dependencies (i.e., all TGDs and EGDs), and a file with a name of the form <file>.t-egds.txt that is empty. This oddity is due to how the dependencies were generated and how our prototype was written.

Reproducing Our Experiments

Use the following steps to reproduce our experiments.