Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality: Code, Datasets, and Experimental Results

Efthymia Tsamoura and Boris Motik

This Web page contains the code, the datasets, and the results of the experiments described in the paper Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality that is currently under review at the Artificial Intelligence Journal. The paper describes a novel method that can answer a query over dependencies without computing the chase in full — that is, by aiming to derive only the consequences relevant to the query.

Code

The code of our prototype is included in the code.zip archive. The prototype implements a mini RAM-based database that can load datasets and dependencies in an extension of the ChaseBench format. The system was written in Java as a wrapper around the well-known RDFox system. RDFox is written in C++ and thus its executable needs to be adapted to the target platform. The code archive contains the lib/JRDFox.jar file, which contains a research version of RDFox that can be run on modern x86_64 systems running a recent version of the Linux operating system. We cannot provide RDFox releases for other platforms at the moment.

Test Scenarios

First-Order Scenarios

The first-order test scenarios are the same as in ChaseBench; however, we changed the syntax of equality atoms from t1 = t2 to =(t1,t2) in our work in order to simplify parsing. All first-order scenarios are given below in full.

Each test scenario consists of a base instance stored as one CSV file per relation without a header, a set of dependencies, and a set of queries. The dependencies they are usually organised by distinguishing source-to-target TGDs (stored in a file with a name of the form <file>.st-tgds.txt), target TGDs (stored in a file with a name of the form <file>.t-tgds.txt), and target EGDs (stored in a file with a name of the form <file>.t-egds.txt). Below is an example of a source-to-target TGD.

    v0(?X1,?X2,?X7,?X8) -> m87004(?X1,?X2,?X7,?X8), m298004(?X2,?X3,?X9,?X10), m113004(?X3,?X4,?X11,?X12) .

In addition, each test scenario comes with a set of queries, each stored in a separate file. Below is an example of a query.

    q01(?m63004_c0) <- m63004(?m63004_c0,?m63004_c1,?m63004_c2,?m63004_c3), m197004(?m63004_c0,?m197004_c1,?m197004_c2,?m197004_c3) .

Second-Order Scenarios

The second-order scenarios listed below were generated using the algorithm presented in Section 5 of our paper. The syntax used is the same as for first-order scenarios. However, each scenario contains a file with a name of the form <file>.st-tgds.txt that contains all dependencies (i.e., all TGDs and EGDs), and a file with a name of the form <file>.t-egds.txt that is empty. This oddity is due to how the dependencies were generated and how our prototype was written.

Reproducing Our Experiments

Use the following steps to reproduce our experiments.

Download and compile our code, and make sure that lib/JRDFox.jar library is in your classpath.
Download the relevant scenarios and unzip them all into a common root directory.
Set the variable s_scenariosRoot in the examples/uk/ac/ox/cs/goalDriven/chaseBench/ChaseInstance.java file to the root location of your scenarios. For example, if you unzipped all scenarios to a directory called /Users/Agent007/Scenarios, set the variable as follows.
```
    public static File s_scenariosRoot = new File("/Users/Agent007/Scenarios");
```
To run the chase in full on a first-order scenario, start the main() method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunFOChase.java file. To choose which scenario to run, set the scenarioName, scenarioNameSuffix, and dataVariantName variables as follows.
- For LUBM, set scenarioName to "LUBM", and dataVariantName to "100" or "01K".
- For deep, set scenarioName to "deep", and scenarioNameSuffix to "200" or "300".
- For Ont-256, set scenarioName to "Ont-256".
To run the chase in full on a second-order scenario, start the main() method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunSOChase.java file. To choose which scenario to run, set the scenarioName variable to "Gen-26", "Gen-27", or "Gen-28", and the dataVariantName variable to "1000", "5000", or "10000".
To run our goal-driven algorithm, start the main() method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunFOPipeline.java or the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunSOPipeline.java file, depending on whether the scenario is first- or second-order. The scenario can be selected as in the previous two cases. Furthermore, all queries of the scenario are answered if the queryFileName variable is empty, and otherwise only the query stored in the specific file is answered. Finally, the transformation variable determines which subset of the pipeline to run.
- Use TRANSFORMATION.REL to run just the relevance analysis.
- Use TRANSFORMATION.MAGIC to run just the magic sets transformation.
- Use TRANSFORMATION.ALL to run both the relevance analysis and the magic sets transformation.