Wednesday, April 07, 2010

WHERE OWL fails

Paul Hermans has written an insightful series of blog entries (part 1, part 2, part 3, part 4) in which he reports on his experiences trying to express some SKOS integrity constraints and inference rules with OWL 2. After failing to express those things with OWL 2, he then demonstrates how his goals can be easily achieved with SPARQL with the help of the SPIN framework.

His conclusions (the last sentences from his blog entries):
  1. "If you speak SPARQL fluently, it is fairly easy to define constraints on your RDF data using SPIN."
  2. "And the winner for constraint S13 is clearly SPIN."
  3. "Once again fairly easy to do with SPIN; a long study of the particularities of OWL2 DL restrictions to find out that this constraint cannot be expressed in OWL2 DL."
  4. "SPIN wins again."
Paul is in no way associated with TopQuadrant and we have not asked him to create those write ups for marketing purposes - I discovered them by chance. Paul appears to be fluent with a large variety of technologies and makes balanced use of whatever tools and languages are most useful for his given tasks.

So why does OWL fail in those examples? In my opinion, these examples expose a fundamental design limitation of OWL: OWL is hard-coded against specific design patterns, but anything that goes beyond those patterns cannot be expressed. Furthermore, the choice of supported design patterns is misguided by theoretical assumptions about DL inferencing that are quite often irrelevant for practical purposes.

Let's look at a longer version of this answer. The data model of the Semantic Web is a graph structure consisting of RDF triples. The strengths of RDF is that people can define their own ways of representing data and knowledge, and thus create arbitrary RDF graph patterns. Users are free to define classes with any number of associated properties, forming larger structures that go far beyond the triple level.

In order to check constraints or execute rules on those graph structures, a general graph matching language is needed. A strong candidate for this is SPARQL, especially its WHERE clause. The WHERE clause is able to match fairly complex sub-graph patterns and provides variable bindings that can be used to report constraint violations or to fire the right hand side of a rule.

OWL on the other hand side is not able to represent arbitrary RDF graph patterns, but only a sub-set of those patterns that the designers of OWL found useful. Many of those patterns have seemingly arbitrary restrictions, as illustrated by Paul's examples (e.g., mixing different property types is not allowed in property chains). OWL 2 and some of its implementations such as the OWL API have driven this approach to extremes, making it not even possible to represent those patterns syntactically. This is because OWL 2 is not based on the RDF data model and therefore cannot talk about RDF in general.

So if you want to ask a question that the OWL 2 designers have not anticipated, then you cannot use OWL.

To make matters worse, OWL 2 is heavily influenced by research from the field of Description Logics, which many real-world users find both artificial and unhelpful. The goal of DL is to find a "tractable" sub-set of logic that allows inference engines to "guarantee" that all possible questions will be answered in finite time. While this sounds like an attractive value proposition from a theoretical point of view, practical evidence shows that the sub-set selected for OWL DL does not cover enough real-world use cases (see Paul's entries). Furthermore, there is enough practical evidence suggesting that while OWL DL inferencing may terminate in finite time, this time might be after the heat death of the universe and therefore completely useless. Just look at the mailing list archives of popular OWL DL inference engines to read about complaints of how slow those engines are in the real world. With SPARQL and SPIN you can of course also create very slow queries, but at least you have much greater flexibility and expressivity. And like with any language, a fair amount of engineering and experience allows you to prevent performance pitfalls. You also cannot expect to throw any complex query at your SQL database and expect ideal response times. Engineering is needed.

In defense of OWL, there are lots of useful design patterns encoded in this language, and it is great that the community has a standard vocabulary to talk about classes and things like property cardinalities. There needs to be some standard to capture ontology design patterns, and OWL does a good job for many of them. But this makes OWL just one out of a catalog of vocabularies, on the same level as SKOS or FOAF or SIOC or GoodRelations. It's simply a good vocabulary to talk about classes, while SKOS is a good vocabulary to talk about taxonomies and GoodRelations is a good vocabulary to talk about business.

But for anything that is actionable for the real world, a combination of various vocabularies and a rich constraint and rule language like SPIN is needed.

0 Comments:

Post a Comment

<< Home