Thursday, June 13, 2013

An Extended Turtle Format

Several RDF/OWL ontologies use blank node structures to represent complex objects. For example, OWL uses blank nodes to represent class expression trees consisting of intersections, unions and restrictions. Other ontologies use "reified" objects that combine a value with a unit of measurement. SPIN uses blank nodes to represent SPARQL queries that are attached to classes as rules and constraints. SWP uses blank nodes to represent HTML snippets that are attached to classes as instance views.

In common to all of those blank node structures is that they look ugly and unpredictable in RDF files. Here is an example class with an SWP blank node in Turtle notation:

schema:Person
    a       owl:Class ;
    rdfs:label "Person"^^xsd:string ;
    rdfs:subClassOf owl:Thing ;
    ui:instanceView
        [ a       html:Div ;
          default:subject spin:_this ;
          ui:child
              [ a       swa:Object ;
                arg:predicate schema:familyName ;
                ui:childIndex 1
              ] ;
          ui:child
              [ a       swa:Object ;
                arg:predicate schema:givenName ;
                ui:childIndex 0
              ]
        ] ;

.

Even in this simple example you can see that nobody would want to edit such files per hand, and the use of automated diffing tools to compare versions will be hard because the serialization may be different each time - Turtle doesn't know anything about ui:childIndex for example, and as a result the surrounding blank nodes may be moved to different places each time. Look at any more complex file and you will see how ugly those structures can become...

Many years ago I used this blog to brainstorm about a plug-in mechanism for Turtle. I never got around to implement this idea until this year when I had enough user feedback to confirm that people do want to be able to edit SWP files by hand, and especially want a format that allows them to use off-the-shelf versioning systems. So for for TopBraid 4.3 I have added a new format called Extended Turtle (*.ttlx) that looks like the following:

schema:Person
    a owl:Class ;
    rdfs:label "Person"^^xsd:string ;
    rdfs:subClassOf owl:Thing ;

    ui:instanceView """

<div default:subject=\"{= ?this }\">
    <swa:Object arg:predicate=\"{= schema:givenName }\"/>
    <swa:Object arg:predicate=\"{= schema:familyName }\"/>
</div> """^^ui:Literal ;

.
Extended Turtle uses a pre-processor that turns certain blank node structures into literals and a post-processor that parses the literals back into blank nodes as part of the loading. I have currently only implemented this idea for SWP expressions, but it applies to any blank node structure that also has a parsable text notation. For OWL this would be the Manchester Syntax, for SPIN it would be SPARQL text syntax, and for SWP it is the XML tag syntax. The post-processor simply looks at all literals that have a special datatype (here: ui:Literal) and runs the text parser over those. If this idea was ever considered for standardization, a minimum implementation could require that the datatype (here: ui:Literal) resolved to a web service that takes the literal and the prefixes as input and returns a valid Turtle blank node snippet. Obviously, real-world engines would want to have this parsing done client-side, as plugins to the Turtle parser (and this is what TopBraid does).

One advantage of this notation is that the files are still valid Turtle, so that parsers that do not have those plugins can at least display something. In any case, the main goal of this work was to address the SWP serialization issue and I believe the solution above succeeds with that.

0 Comments:

Post a Comment

<< Home