Wednesday, February 21, 2007

Hybrid Inferencing with TopBraid Composer 2.0

Many people who learn Semantic Web languages have a hard time understanding the concept of inference engines (aka reasoners). In a nutshell, an inference engine takes an "asserted" model as input and creates an "inferred" model as output. In the open world of the Semantic Web, inference engines can only add new triples, i.e. the output model is the input model plus additional inferred triples. In other words, inference engines derive new knowledge from existing knowledge.

Various types of inference engines exist, including OWL DL tableaux reasoners like Pellet, and Rule engines such as Jena. Even SPARQL can be regarded as an inference engine, because its CONSTRUCT keyword can be used to derive new triples from existing triples. Some applications may also require custom-tailored reasoning engines that apply temporal or geospatial knowledge to infer new knowledge.

However, implementing a complete inference engine that adds some functionality on top of OWL and RDF Schema is a lot of work, and few people want to reinvent the wheel. For example, vanilla implementations of SPARQL and the Jena rules language have no built-in knowledge of OWL DL semantics, yet in many use cases, you want to run SPARQL or Jena on top of an OWL-compliant reasoning service such as Pellet. Fortunately, since all inference engines are essentially black boxes that take some triples as input and generate more triples as output, it is possible to link various engines together so that the output of one engine becomes the input to the next.

Some support of such hybrid inference chaining had already been provided by previous versions of TopBraid Composer. However, this support was scattered across various places in the user interface and not as flexible as it should. This limited its use and made things complicating to explain and demonstrate. So we took the opportunity of the new TopBraid Composer 2.0 release to generalize and clean up our inferencing support. How this works is illustrated in the screenshot below (click on the image for a larger picture):



The configuration dialog above can be used to select and arrange reasoning engines for a given project. In this case,

  • we are taking an asserted model containing real-estate properties and execute Pellet over it. This would reveal OWL DL relationships between concepts, for example to infer that if House is a subclass of RealEstate, then any particular instance of House is also an instance of RealEstate. Supported by this additional knowledge,

  • we run a bunch of SWRL or Jena rules to infer relationships that cannot be expressed in OWL DL. For example, the Jena rules engine has a built-in function to do mathematical calculations, so that we can, for example, convert Australian dollar values into US dollars. After we have done these additional calculations,

  • we execute SPARQL CONSTRUCT queries to establish new triples and relationships. For example, now that we know the US dollar price of a certain House in Australia, we can evaluate whether it matches my SPARQL FILTER maximum of 900,000 $US. Since this step may create new triples, and these new triples may lead to new relationships from Pellet's point of view,

  • we repeat the steps above until no additional triples have been added by any step.


If you want to play with this scenario yourself, download TopBraid Composer 2.0, import the example real-estate ontology and configure the infererence engines as shown above. Use the new Run Inferences button to execute the process - it should come back with suggestions on which house to buy for some customers. Note that the file contains some constructs both in their Jena rules notation as well as the SPARQL notation, but you can disable all Jena rules (except for the currency conversion) to get the complete results - the Jena rules are there to illustrate the syntax of both approaches.

While the capability to chain multiple inference engines together is not radically new, the new architecture and its simple, intuitive user interface in TopBraid Composer will open the door for many new kinds of applications. For example, you can define your own reasoning engines with SPARQL or Jena and use them to provide on-the-fly translations and mappings from one data structure into another. Simply collect your mapping rules in a separate ontology and give them to others so that they can put these rules into their own inference delegation chain. If the expressivity of OWL, SPARQL, SWRL or Jena is not sufficient for your needs, you may use a new Eclipse extension point in Composer to add your own Java functions for domain-specific reasoners (ask me for details if you are interested in this). Based on this architecture, we will start to roll out several new features for ontology mapping, data integration and mash-up creation in the coming months.

OWL 1.1 Support in TopBraid Composer 2.0

OWL 1.1 is an evolving extension of the Web Ontology Language (OWL). Driven by a world-wide community of researchers, this proposal aims at fixing some of the most frequently encountered limitations of OWL such as the lack of user-defined datatypes, and cumbersome disjointness axioms. The new 1.1 features include extra syntactic sugar, additional property and qualified cardinality constructors, extended datatype support, simple metamodelling, and extended annotations.

While OWL 1.1 is still at a rather early stage, and will not become an official recommendation before 2008, I believe that OWL 1.1 contains many invaluable extensions that are already useful for everyday work. We therefore decided to integrate 1.1 support into the new 2.0 release of TopBraid Composer. The Pellet version integrated in Composer already supports reasoning with many of the OWL 1.1 constructs, and more 1.1 tools and APIs will follow. At the same time, users need to be aware that they are adopting the language at a stage where it will almost certainly change. As a result, ontologies that use the new features now may contain deprecated constructs until the language has been officially sanctioned. Furthermore, tool support is on the bleeding edge and will not be as stable and reliable as support for OWL 1.0.

Here are some examples of how the new features are used in Composer. User-defined datatypes are subtypes of the built-in datatypes such as xsd:int and xsd:string. In the screenshot below, all instances of the class Adult must have their age between 18 and 65:


OWL 1.1 Local DataRange


Internally, the new OWL 1.1 language constructs are represented as additional triples. In the example below, I have restricted the range of the age property to fall within the interval of 0 to 200. As shown in the nested form, the rdfs:range of the property is an instance of the system class owl:DataRange, with additional attributes to represent facets such as minInclusive.


OWL 1.1 User-defined datatype


A common problem with OWL DL ontologies is that a large number of disjointness axioms is needed to declare that certain classes do not overlap (i.e. share instances). OWL 1.1 introduces the disjointUnionOf property as a convenient shortcut for such cases. In the following TopBraid Composer screenshot, the class RealEstate is defined as the disjoint union of its subclasses Apartment, House and Townhouse using a compact xor notation. Reasoning engines that see such statements would insert pairwise disjointness axioms between those classes on the fly.


OWL 1.1 xor


Qualified Cardinality Restrictions (QCRs) allow you to define the class of Persons that have some children, at least two of which must be Female:


OWL 1.1 QCR


There are several other extensions - please see TopBraid Composer's documentation (Help > Help Contents > TopBraid Composer) and the OWL 1.1 documents on the Web for details. Don't forget to activate OWL 1.1 support in the ontology start page - by default these features are switched off so that users can safely rely on OWL 1.0 if they prefer. Feel free to drop me a line to provide your feedback and questions.

Announcing TopBraid Composer 2.0

We are very excited to announce the release of TopBraid Composer 2.0 today. This milestone release makes Composer the first professional Semantic Web development tool to support the evolving OWL 1.1 specification. We have also significantly improved the inferencing architecture, allowing TopBraid users to execute reasoners in a delegation chain, and to use SPARQL as a rule language. I will write detailed blog entries about these features shortly.

In addition to these and many other new features (listed on our web page), we have spent extra time debugging to make sure that Composer stays ahead of its competitors in terms of stability and performance. We have discovered and worked around some Eclipse incompatibilities on Mac computers. Doing a major release was also a good opportunity to review and update the help pages: I actually ended up adapting all pages and replacing all the screenshots. Finally, we have simplified the installation on Windows, and now provide a clean installer package including Java and Eclipse.

Reflecting on the rapid evolution of the tool since its first release early 2006, I would like to acknowledge all of our users who have provided invaluable feedback and otherwise supported our work. We very much appreciate to have you on board! Many of the features that you find in Composer these days are also a direct result of TopQuadrant's semantic web projects with customers around the world. We all hope you enjoy TopBraid Composer 2.0 as much as we do!