Wednesday, April 07, 2010

WHERE OWL fails

Paul Hermans has written an insightful series of blog entries (part 1, part 2, part 3, part 4) in which he reports on his experiences trying to express some SKOS integrity constraints and inference rules with OWL 2. After failing to express those things with OWL 2, he then demonstrates how his goals can be easily achieved with SPARQL with the help of the SPIN framework.

His conclusions (the last sentences from his blog entries):
  1. "If you speak SPARQL fluently, it is fairly easy to define constraints on your RDF data using SPIN."
  2. "And the winner for constraint S13 is clearly SPIN."
  3. "Once again fairly easy to do with SPIN; a long study of the particularities of OWL2 DL restrictions to find out that this constraint cannot be expressed in OWL2 DL."
  4. "SPIN wins again."
Paul is in no way associated with TopQuadrant and we have not asked him to create those write ups for marketing purposes - I discovered them by chance. Paul appears to be fluent with a large variety of technologies and makes balanced use of whatever tools and languages are most useful for his given tasks.

So why does OWL fail in those examples? In my opinion, these examples expose a fundamental design limitation of OWL: OWL is hard-coded against specific design patterns, but anything that goes beyond those patterns cannot be expressed. Furthermore, the choice of supported design patterns is misguided by theoretical assumptions about DL inferencing that are quite often irrelevant for practical purposes.

Let's look at a longer version of this answer. The data model of the Semantic Web is a graph structure consisting of RDF triples. The strengths of RDF is that people can define their own ways of representing data and knowledge, and thus create arbitrary RDF graph patterns. Users are free to define classes with any number of associated properties, forming larger structures that go far beyond the triple level.

In order to check constraints or execute rules on those graph structures, a general graph matching language is needed. A strong candidate for this is SPARQL, especially its WHERE clause. The WHERE clause is able to match fairly complex sub-graph patterns and provides variable bindings that can be used to report constraint violations or to fire the right hand side of a rule.

OWL on the other hand side is not able to represent arbitrary RDF graph patterns, but only a sub-set of those patterns that the designers of OWL found useful. Many of those patterns have seemingly arbitrary restrictions, as illustrated by Paul's examples (e.g., mixing different property types is not allowed in property chains). OWL 2 and some of its implementations such as the OWL API have driven this approach to extremes, making it not even possible to represent those patterns syntactically. This is because OWL 2 is not based on the RDF data model and therefore cannot talk about RDF in general.

So if you want to ask a question that the OWL 2 designers have not anticipated, then you cannot use OWL.

To make matters worse, OWL 2 is heavily influenced by research from the field of Description Logics, which many real-world users find both artificial and unhelpful. The goal of DL is to find a "tractable" sub-set of logic that allows inference engines to "guarantee" that all possible questions will be answered in finite time. While this sounds like an attractive value proposition from a theoretical point of view, practical evidence shows that the sub-set selected for OWL DL does not cover enough real-world use cases (see Paul's entries). Furthermore, there is enough practical evidence suggesting that while OWL DL inferencing may terminate in finite time, this time might be after the heat death of the universe and therefore completely useless. Just look at the mailing list archives of popular OWL DL inference engines to read about complaints of how slow those engines are in the real world. With SPARQL and SPIN you can of course also create very slow queries, but at least you have much greater flexibility and expressivity. And like with any language, a fair amount of engineering and experience allows you to prevent performance pitfalls. You also cannot expect to throw any complex query at your SQL database and expect ideal response times. Engineering is needed.

In defense of OWL, there are lots of useful design patterns encoded in this language, and it is great that the community has a standard vocabulary to talk about classes and things like property cardinalities. There needs to be some standard to capture ontology design patterns, and OWL does a good job for many of them. But this makes OWL just one out of a catalog of vocabularies, on the same level as SKOS or FOAF or SIOC or GoodRelations. It's simply a good vocabulary to talk about classes, while SKOS is a good vocabulary to talk about taxonomies and GoodRelations is a good vocabulary to talk about business.

But for anything that is actionable for the real world, a combination of various vocabularies and a rich constraint and rule language like SPIN is needed.

Saturday, April 03, 2010

The SPIN Technology Stack

Regular readers of this blog may notice that I am a big fan of SPARQL-based technologies. In fact, most of my work in the last couple of years went into defining extensions to SPARQL, and implementing editing and debugging tools for those extensions. The results of this work have been made core features of the TopBraid Suite, one of the most successful industrial semantic development platforms.

The first of those SPARQL extensions was SPARQLMotion that somehow emerged out of discussions on scripting languages between me and Dean Allemang (wow: he has a Wikipedia page) and other TopQuadrant colleagues. SPARQLMotion was published in the end of 2007 and has matured considerably in the last year, often driven by real-world feedback from our growing user base. SPARQLMotion is a visual scripting language that simplifies the development of data processing pipelines. In addition to its use in TopBraid Composer, SPARQLMotion scripts can be executed as TopBraid Live web services or used to drive TopBraid Ensemble applications.

A year later, at the end of 2008, we published the SPARQL Inferencing Notation (SPIN), a SPARQL-based rule and constraint checking language. SPIN also greatly extends SPARQL itself through its support for user-defined functions, magic properties and templates. SPARQLMotion now uses parts of SPIN for its base infrastructure, and SPIN functions can also be used in SPARQLMotion scripts. Major application areas of SPIN range from ontology mapping to rule-based systems and even computer games.

The newest addition to this family is UISPIN, published as a beta release with TopBraid Composer 3.3. UISPIN makes it possible to link RDF and OWL models with user interface descriptions that can be rendered as HTML or SVG documents. UISPIN will enable the creation of a new generation of dynamic business applications in which the rendering of content is entirely model-driven. UISPIN is also based on SPIN and borrows its ideas of procedural attachment to classes and pre-bound variables.

Taking a step back, we can now draw a pretty picture to illustrate how those languages fit perfectly together, forming the SPIN Technology Stack:

This pragmatic collection of technologies offers a fairly complete infrastructure for projects based on linked data and semantic web. Based only on RDF and SPARQL as well as bits of RDF Schema (and, if you like, OWL), the SPIN Stack covers a wide range of business needs:
  • RDFS/OWL + SPIN: rich, self-describing domain models
  • SPARQLMotion: executable behavior
  • UISPIN: model-driven dynamic user interfaces
Software developers may recognize that those three pieces correspond to the well-known Model-View-Controller (MVC) architecture pattern. The SPIN Stack now basically covers all aspects of classical software architecture. These languages are careful extensions of the official Semantic Web standards that take RDF/OWL and SPARQL out of the research labs and into the real world.

And the best thing is that there is a lot of energy behind SPARQL (with SPARQL 1.1 on its way) and lots of other extensions, online SPARQL end points and efficient SPARQL databases on the market. As this market grows and more and more developers become familiar with SPARQL, the SPIN Technology Stack will be a safe investment for companies that wish to create flexible solutions based on smart, self-describing data.

Friday, April 02, 2010

UISPIN Example: Documenting SPIN Functions

The UISPIN framework can be used to create HTML documents from templates that contain embedded SPARQL queries and expressions. You can attach those templates to the classes of your domain model to help the system find the most suitable rendering of the instances. In this blog entry I show how this technique can be applied to create ontology documentation. In particular I create a UISPIN model for rendering SPIN functions, but similar ideas can be applied to other language elements such as OWL classes. This example also demonstrates many key features of UISPIN, including user-defined elements and control structures.

The end result of this example will look like the following. Whenever a user navigates to a SPIN function (here: spl:object), then the Browser will display the HTML shown below.

The resulting HTML page displays the qname of the function in the heading, then the comment of the function, then a section listing all arguments of the function and finally the body query (if one exists).

In order to produce such a rendering, I have created a new file that contains the UISPIN definitions for the SPIN ontology. This file imports the SPIN namespace (http://spinrdf.org/spin) and the TUI namespace (http://uispin.org/tui, which in turn imports the HTML support for UISPIN). Since I want to document all SPIN functions, I have attached the following ui:instanceView to the class spin:Function.

The UISPIN snippet above basically states that all instances of spin:Function shall be rendered as a HTML div element containing a h2 etc. The snippet contains many SPARQL expressions in the {= ... } notation, and those expressions are executed when the page is being rendered. As usual, the variable ?this points to the current instance, i.e. an instance of spin:Function, and functions such as spl:object can be called to conveniently retrieve properties of ?this.

The snippet above also contains the control element ui:if that only inserts its child elements if the ui:condition evaluates to true.

Finally, there is also a user-defined UISPIN element, spin.ui:ArgumentsList, which is defined as shown in the next TopBraid Composer screenshot.

A user-defined UISPIN element has a unique identifier (URI) so that it can be shared and reused on the Semantic Web. Then it can have any number of arguments, using the spl:Argument SPIN template at spin:constraint. By the way this is the same design pattern as for user-defined SPIN functions, templates and SPARQLMotion modules.

The core of any user-defined UISPIN class though is its ui:prototype. This is the snippet that will be inserted into the document where the element is used. In the prototype, the values of the arguments are pre-bound to the variables shown in bold face. In the example above, the value of the spin.ui:module argument will be bound to the variable ?module in the prototype. If you scroll up two screenshots, you can see that the call of the spin.ui:ArgumentsList element includes a value for spin.ui:module. This means that the current value of the variable ?this will be inserted as ?module into the prototype. The prototype itself contains a control element of type ui:forEach that traverses all arguments of the current module, and creates one table row and table data element for each of them.

Finally, the prototype also makes use of the control element ui:resourceView, which will insert the default rendering of the ui:resource into the target document. The values of ui:resource in this case are the arguments (?arg), and those are instances of spl:Argument. This class has the following ui:instanceView attached to it.

While the above snippet may be hard to read due to its formatting, you may see that it will display the predicate's qname, followed by its value type, followed by the word [Optional].

Let's quickly summarize the main concepts from this example:
  1. You can attach HTML snippets to classes using the property ui:instanceView.
  2. The variable ?this points to the current instance of the class.
  3. Views may contain user-defined elements, which have a prototype.
  4. In the prototype, the arguments of the element are pre-bound as variables.
  5. ui:if can be used to insert an HTML block based on a condition being true.
  6. ui:forEach can be used to repeat a block for each row of a SPARQL query.
  7. ui:resourceView will insert the default rendering (e.g. ui:instanceView) of a given resoource.
Please give UISPIN a try by downloading TopBraid Composer ME 3.3. and let us know your feedback on the topbraid-users mailing list. Note that UISPIN is still in beta stage, and your input may shape its future.

Thursday, April 01, 2010

Charts and Business Reports with UISPIN

UISPIN makes it easy to develop new components that can be used like HTML tags in a document. Those components may comprise of a complex snippet of HTML and other XML-based languages such as SVG. UISPIN components can be published on the web, where they are identified by their URI. UISPIN Charts is a library of reusable components for visualizing data on charts and maps. Based on Google Charts and Google Maps, there is no need to install anything - the components are entirely declarative and ready to use.

The UISPIN Charts page shows many example screenshots of the various kinds of charts and maps that are currently supported: Pie charts, Bar charts, Map charts, and Google Maps. More will be added in the future.

Here is a step-by-step tutorial illustrating how to use those charts. We will create the pie chart shown in the following screen shot.

Let's first look into the RDF model of the data that we want to visualize. This data is about election results, and is downloadable as a Turtle file from here. Download it into your TopBraid Composer ME 3.3 workspace to follow the exercise.

The data file defines the classes shown in the diagram below. Instances of the class ex:Election point to a collection of ex:ElectionResults. Each result links a party with a percentage (an xsd:float literal).

Here is a form view of the ex:Election instance that we want to visualize. There are four ex:ElectionResults and they are blank nodes.

In order to use the chart components, we need to create a file that imports the UISPIN charts vocabulary. We can either import this namespace into the data document itself, or we can create a new document that imports both the charts and the data model. Since we do not want to clutter the data file, we create a new RDF file and drag the charts namespace as well as the data model into the Imports View. The charts namespace is part of the system ontologies, under the TopBraid/UISPIN project. The imports view should look similar to the following screenshot.

Having both the domain model and the charts namespace in the same RDF model allows us to create links between the Election classes and suitable visualizations. Select the class ex:Election so that it shows up on the form. You will now see a property ui:instanceView on the form. This is where we will put the definition of the pie chart later.

But first let's think about the actual data. Pie charts visualize slices of data in proportion to one another. Each slice can have a label (here: the name of the political party). These value-label pairs form a table, and this table is the input to the pie chart. Select the instance of ex:Election (ex:AustralianFederalElection2006) and use the SPARQL View to run the following query:

This query returns the numeric value in the first column, and the label in the second. This happens to be exactly the format that the charts:PieChart component expects. Note that the query above uses a built-in feature of TopBraid Composer: in the SPARQL view, the variable ?this is pre-bound with the currently selected resource.

Go back to the class ex:Election and locate the ui:instanceView property on its form. We could now add an empty row and type the UISPIN snippet of the pie chart in, but in this exercise we use a different route: Open the context menu behind the ui:instanceView property name and select Create blank node... which opens a class selection dialog as the following.
You need to select the class charts:PieChart here. In order to find it quickly, click on the Group by namespace button in the lower left corner. After clicking on OK, a new anonymous instance of charts:PieChart will be created and displayed as a nested form as shown next.


This nested form will enumerate all potential arguments of the pie chart class, including charts:label and ui:resultSet. Scroll down to find the widget for ui:resultSet, add an empty row and copy and paste the SPARQL query from the query view into it:

Now, go to charts:label and select Add SPARQL expression from its context menu. This will open up an empty row, in which you can paste the expression ui:label(?this), which will insert the name of the current election into the pie chart. Finally, set html:width and html:height to 500 by 240 as shown below:

That's the complete definition of our pie chart, and closing the nested form you can see its XML syntax:

You could have entered this directly in XML syntax instead of going through the form. In any case, we can now go back to the instance of ex:Election and click on the Browser tab at the bottom of the editor window to get the screenshot from the top of this article.

This pie chart is now the default visualization of all instances of ex:Election. But since the charts:PieChart element is just one among many other available elements (including the whole range of HTML tags), we can turn the pie chart into a comprehensive report that contains other views (such as a charts:BarChart), an HTML table and whatever else we want to display.