Efthymia Tsamoura and Boris Motik
This Web page contains the code, the datasets, and the results of the experiments described in the paper Goal-Driven Query Answering over First- and Second-Order Dependencies with Equality that is currently under review at the Artificial Intelligence Journal. The paper describes a novel method that can answer a query over dependencies without computing the chase in full — that is, by aiming to derive only the consequences relevant to the query.
The code of our prototype is included in the code.zip archive. The prototype implements a mini RAM-based database that can load datasets and dependencies in an extension of the ChaseBench format. The system was written in Java as a wrapper around the well-known RDFox system. RDFox is written in C++ and thus its executable needs to be adapted to the target platform. The code archive contains the lib/JRDFox.jar
file, which contains a research version of RDFox that can be run on modern x86_64 systems running a recent version of the Linux operating system. We cannot provide RDFox releases for other platforms at the moment.
The first-order test scenarios are the same as in ChaseBench; however, we changed the syntax of equality atoms from t1 = t2
to =(t1,t2)
in our work in order to simplify parsing. All first-order scenarios are given below in full.
Each test scenario consists of a base instance stored as one CSV file per relation without a header, a set of dependencies, and a set of queries. The dependencies they are usually organised by distinguishing source-to-target TGDs (stored in a file with a name of the form <file>.st-tgds.txt
), target TGDs (stored in a file with a name of the form <file>.t-tgds.txt
), and target EGDs (stored in a file with a name of the form <file>.t-egds.txt
). Below is an example of a source-to-target TGD.
v0(?X1,?X2,?X7,?X8) -> m87004(?X1,?X2,?X7,?X8), m298004(?X2,?X3,?X9,?X10), m113004(?X3,?X4,?X11,?X12) .
In addition, each test scenario comes with a set of queries, each stored in a separate file. Below is an example of a query.
q01(?m63004_c0) <- m63004(?m63004_c0,?m63004_c1,?m63004_c2,?m63004_c3), m197004(?m63004_c0,?m197004_c1,?m197004_c2,?m197004_c3) .
The second-order scenarios listed below were generated using the algorithm presented in Section 5 of our paper. The syntax used is the same as for first-order scenarios. However, each scenario contains a file with a name of the form <file>.st-tgds.txt
that contains all dependencies (i.e., all TGDs and EGDs), and a file with a name of the form <file>.t-egds.txt
that is empty. This oddity is due to how the dependencies were generated and how our prototype was written.
Use the following steps to reproduce our experiments.
lib/JRDFox.jar
library is in your classpath.s_scenariosRoot
in the examples/uk/ac/ox/cs/goalDriven/chaseBench/ChaseInstance.java
file to the root location of your scenarios. For example, if you unzipped all scenarios to a directory called /Users/Agent007/Scenarios
, set the variable as follows.
public static File s_scenariosRoot = new File("/Users/Agent007/Scenarios");
main()
method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunFOChase.java
file. To choose which scenario to run, set the scenarioName
, scenarioNameSuffix
, and dataVariantName
variables as follows.
scenarioName
to "LUBM"
, and dataVariantName
to "100"
or "01K"
.scenarioName
to "deep"
, and scenarioNameSuffix
to "200"
or "300"
.scenarioName
to "Ont-256"
.main()
method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunSOChase.java
file. To choose which scenario to run, set the scenarioName
variable to "Gen-26"
, "Gen-27"
, or "Gen-28"
, and the dataVariantName
variable to "1000"
, "5000"
, or "10000"
.main()
method of the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunFOPipeline.java
or the examples/uk/ac/ox/cs/goalDriven/chaseBench/RunSOPipeline.java
file, depending on whether the scenario is first- or second-order. The scenario can be selected as in the previous two cases. Furthermore, all queries of the scenario are answered if the queryFileName
variable is empty, and otherwise only the query stored in the specific file is answered. Finally, the transformation
variable determines which subset of the pipeline to run.
TRANSFORMATION.REL
to run just the relevance analysis.
TRANSFORMATION.MAGIC
to run just the magic sets transformation.
TRANSFORMATION.ALL
to run both the relevance analysis and the magic sets transformation.