Wednesday, February 21, 2007

Hybrid Inferencing with TopBraid Composer 2.0

Many people who learn Semantic Web languages have a hard time understanding the concept of inference engines (aka reasoners). In a nutshell, an inference engine takes an "asserted" model as input and creates an "inferred" model as output. In the open world of the Semantic Web, inference engines can only add new triples, i.e. the output model is the input model plus additional inferred triples. In other words, inference engines derive new knowledge from existing knowledge.

Various types of inference engines exist, including OWL DL tableaux reasoners like Pellet, and Rule engines such as Jena. Even SPARQL can be regarded as an inference engine, because its CONSTRUCT keyword can be used to derive new triples from existing triples. Some applications may also require custom-tailored reasoning engines that apply temporal or geospatial knowledge to infer new knowledge.

However, implementing a complete inference engine that adds some functionality on top of OWL and RDF Schema is a lot of work, and few people want to reinvent the wheel. For example, vanilla implementations of SPARQL and the Jena rules language have no built-in knowledge of OWL DL semantics, yet in many use cases, you want to run SPARQL or Jena on top of an OWL-compliant reasoning service such as Pellet. Fortunately, since all inference engines are essentially black boxes that take some triples as input and generate more triples as output, it is possible to link various engines together so that the output of one engine becomes the input to the next.

Some support of such hybrid inference chaining had already been provided by previous versions of TopBraid Composer. However, this support was scattered across various places in the user interface and not as flexible as it should. This limited its use and made things complicating to explain and demonstrate. So we took the opportunity of the new TopBraid Composer 2.0 release to generalize and clean up our inferencing support. How this works is illustrated in the screenshot below (click on the image for a larger picture):

The configuration dialog above can be used to select and arrange reasoning engines for a given project. In this case,

  • we are taking an asserted model containing real-estate properties and execute Pellet over it. This would reveal OWL DL relationships between concepts, for example to infer that if House is a subclass of RealEstate, then any particular instance of House is also an instance of RealEstate. Supported by this additional knowledge,

  • we run a bunch of SWRL or Jena rules to infer relationships that cannot be expressed in OWL DL. For example, the Jena rules engine has a built-in function to do mathematical calculations, so that we can, for example, convert Australian dollar values into US dollars. After we have done these additional calculations,

  • we execute SPARQL CONSTRUCT queries to establish new triples and relationships. For example, now that we know the US dollar price of a certain House in Australia, we can evaluate whether it matches my SPARQL FILTER maximum of 900,000 $US. Since this step may create new triples, and these new triples may lead to new relationships from Pellet's point of view,

  • we repeat the steps above until no additional triples have been added by any step.

If you want to play with this scenario yourself, download TopBraid Composer 2.0, import the example real-estate ontology and configure the infererence engines as shown above. Use the new Run Inferences button to execute the process - it should come back with suggestions on which house to buy for some customers. Note that the file contains some constructs both in their Jena rules notation as well as the SPARQL notation, but you can disable all Jena rules (except for the currency conversion) to get the complete results - the Jena rules are there to illustrate the syntax of both approaches.

While the capability to chain multiple inference engines together is not radically new, the new architecture and its simple, intuitive user interface in TopBraid Composer will open the door for many new kinds of applications. For example, you can define your own reasoning engines with SPARQL or Jena and use them to provide on-the-fly translations and mappings from one data structure into another. Simply collect your mapping rules in a separate ontology and give them to others so that they can put these rules into their own inference delegation chain. If the expressivity of OWL, SPARQL, SWRL or Jena is not sufficient for your needs, you may use a new Eclipse extension point in Composer to add your own Java functions for domain-specific reasoners (ask me for details if you are interested in this). Based on this architecture, we will start to roll out several new features for ontology mapping, data integration and mash-up creation in the coming months.


Post a Comment

<< Home