Tuesday, January 25, 2011

A Textual Syntax for SPARQLMotion

UPDATE (2012-07-04): TopBraid 3.6.1 is the last version supporting this syntax. It is now possible to embed SPARQLMotion scripts into SWP documents, and this provides very similar capabilities (and more).

SPARQLMotion is an RDF-based scripting language that is suitable to be presented and edited graphically to form data processing pipelines. Many of our customers are using SPARQLMotion and we are constantly extending and refining the tools to make it more powerful. One of the recent enhancements that made it into TopBraid 3.4 is support for an alternative textual notation for SPARQLMotion. A spec for this notation can be found here:

An example of how this XML-based notation for SPARQLMotion can be used is shown in the TopBraid Composer screenshot's sm:bodyScript field below.

The script above is stored in the same RDF-based format like other SPARQLMotion scripts, and can still be visualized graphically:

In the past few years since SPARQLMotion was created, several people had asked about a notation that can be edited with conventional text editing tools to create scripts. Among the advantages of a text-based notation is that it becomes easier to perform large-scale refactorings to move things around. It is sometimes simply faster, plus there is no need to "invent" artificial URIs for the nodes in a script. A great plus of the XML-based notation is that it becomes easy to insert SPARQL Web Pages (aka UISPIN) snippets directly into a single document. This can significantly accelerate the development of SPARQL-based web services.

The XML notation does have various limitations though. In particular it is only suitable for a subset of SPARQLMotion - there is no concept of multiple predecessor nodes in a linear notation.

Monday, January 10, 2011

Understanding SPARQL Rules with the SPIN Statistics View

One of the new features of TopBraid Composer 3.4 is a new view called SPIN Statistics. Whenever this view is open and you run some SPARQL Rules, this will record the execution time of each individual rule. When completed, you can browse the performance characteristics of each rule, grouped by invocations or by the associated class. In the following screenshot, the SPIN rules for OWL 2 RL executed over the pizza ontology are recorded:

According to the statistics in this screenshot, the rule that implements the transitivity of rdfs:subClassOf has taken the largest fraction of the time. This can help identify performance bottlenecks, and may also be useful to understand better what happens inside of the rule engine. The view can be filled incrementally, e.g. to accumulate how certain rules fire over different data sets.