SPIN Diff: Rule-based Comparison of RDF Models
One of the new features of the Maestro Edition of TopBraid Composer 3.1 is a simple yet very flexible diff tool that can be used to compare two versions of an RDF file or database. Diffing is a common requirement of collaborative modeling work, but conventional text-based diff tools often fail miserably on RDF-based data.
TopBraid's Diff feature is based on the SPARQL-based rule language SPIN. Our approach works as follows:
- We have defined a simple diff ontology that defines classes and properties to describe differences between two RDF models.
- The diff engine creates instances of those diff ontology classes, e.g. diff:AddedTripleDiff.
- The rules that construct those diff instances are expressed as SPARQL CONSTRUCT queries in SPIN.
- Any RDF-based browser such as TopBraid Composer or Ensemble can be used to browse the results of the diff engine - they are simply another RDF model.
- Likewise, any RDF-based processing language such as OWL, SPIN or SPARQLMotion can be used to post-process the output of the diff engine, for example to create more detailed diff reports for specific ontology design patterns.
Let's have a look at a simple example with the out-of-the-box behavior first. The example ontology contains a class Person with an owl:Restriction attached to it, as well as labels and comments. I have made some edits to the file and ran the diff tool from TopBraid Composer's Model menu. After selecting the old version of the file, the diff engine runs the rules encoded in the diff.owl file and produces a new file containing the change records. There is a stored SPARQL query in the diff file, defined as a SPIN Template, which can be used to conveniently list all Diff objects in a table. Use the Run SPARQL Query from SPIN Template button in Composer's main tool bar to open a diff report such as the following:
The diff output file also imports the current domain ontology, and you can therefore double-click on any diff entry to navigate to the relevant subject or predicate. A closer look reveals that the diff reports consists of instances such as the following:
This instance of type diff:AddedTripleDiff states that a new rdfs:comment "A human being" has been added to the class difftest:Person. As you can see, the details of the particular change can be queried using properties such as rdf:subject.
The instance above has been created with the following SPIN rule, which is attached to the diff:AddedTripleDiff class:
The trick is that the diff rules can query the old and the new triples using pre-defined named graphs (diff:new and diff:old), and then construct response objects based on comparisons across those two graphs. In the example above, a helper method from the SPIN Standard Modules Library spl:hasValue is used to check whether a given object has already been present in the old graph.
As you can see, this whole diff engine is very much model-driven and not only operates on RDF, but also produces RDF as output, and uses SPARQL rules to query the RDF graphs. The particular collection of rules to fire is stored in the diff.owl file. This makes the engine very adjustable and extensible. For example, let's assume that we want the engine to be smarter about OWL syntax and create custom diff objects for changes to OWL restrictions. In order to do that, we create a new class diff:ChangedRestrictionDiff, and attach the following diff:rule to it:
This will then create more helpful diff messages such as "owl:minCardinality restriction on property firstName at class Person changed filler from 0 to 1". You can create similar custom-tailored rules for other ontology design patterns.
The generic approach presented here has the advantage of being very flexible. The current limitations are cases such as global renaming of a resource, which would currently be regarded as a collection of add and delete triple events. Higher-level rules are needed to analyze the lower-level diff results for such patterns and combine those "add/delete triple" events into "rename" events. Fortunately, given the open architecture of the system, anyone is able to contribute such changes without necessarily having to write any Java code. The beauty of generic model-driven engines is that this power is transferred to the end users. Furthermore, the behavior of the system is transparently encoded into rules that can be viewed by anyone, including software agents on the Semantic Web.