Composing the Semantic Web: January 2009

Sunday, January 18, 2009

The Object-Oriented Semantic Web with SPIN

We have recently introduced the SPARQL Inferencing Notation (SPIN) as a mechanism that uses SPARQL to formalize the meaning and behavior of Semantic Web concepts. SPIN provides a light-weight RDF vocabulary that can be used to attach rules (SPARQL CONSTRUCT queries) and constraint checks (ASK queries) to class definitions.

The design of SPIN was driven by requirements that we (at TopQuadrant) have collected from customers from "real-world" semantic technology projects. For many of those customers, the current stack of Semantic Web languages is simply not enough. OWL has been designed for certain use cases, but it fails to meet other requirements due to its open-world assumption. As Stefan Decker points out in the entertaining "An OWL 2 Far?" panel at the 2008 ISWC conference, OWL cannot even be used to check whether an instance of a class meets the cardinality restrictions. In the same panel, Tim Finin argues that OWL does not need to be the only language on the Semantic Web stack, and that instead families of languages are needed depending on the use case. So while OWL is great for some modeling tasks, there is room (and demand) for other modeling paradigms.

In this article I will introduce SPIN from another angle to illustrate what makes it different, and why I believe that it provides good solutions to many real-world requirements. My main point is that SPIN is borrowing good practices from object-oriented programming and modeling languages and integrates object-oriented techniques with the flexible architecture of Semantic Web to produce a new way of working with linked data.

As a "Hello World" example let's start with a simple ontology about geometric shapes. It defines a class Rectangle with the following characteristics (screenshot from TopBraid Composer 3):

In this RDF/SPIN ontology, a Rectangle is a class that has three properties. The values of width and height are specified by the user, and a SPIN rule is used to compute the value of the area property by multiplying width and height. Let's have a look at the rule first. A SPIN rule is a SPARQL CONSTRUCT query that has been attached to a class using the spin:rule property. The CONSTRUCT query can use whatever feature of SPARQL it wants, for example to do mathematical calculations.

A key contribution of SPIN is to introduce a mechanism that allows users to organize those SPARQL queries in a natural, object-oriented way. SPIN rules are not just plain lists of rules like in comparable rule languages (SWRL etc). Instead, the "procedural attachment" of SPIN means that you can arrange the rules in the class hierarchy where they belong. This follows the OO principles of abstraction and encapsulation. Since the rules (and constraints) are attached to classes, any human or agent who looks at the ontology can quickly understand the meaning of the classes and properties. Furthermore, the rules are "scoped" so that tools are better guided when they need to execute the rules and constraints. For example, TopBraid Composer's SPIN engine comes with a mode in which it does incremental inferencing. This means that whenever someone changes the values of width or height, then the value of area will update automatically, as shown with the example instance below:

Now let's extend the example and introduce a sub-class of Rectangle called Square. Squares inherit all characteristics of Rectangles, but they are constrained in so far that width and height of a Square must be equal. This fact is expressed using a SPIN constraint as shown below:

The constraint is an ASK query that must evaluate to false for all instances of the associated class (Square). Again, any SPARQL graph matching pattern can be tested here, and the dedicated variable ?this is used to access the current instance as a starting point. Here is an example instance of the class Square:

You can see that editing tools can use the constraint definitions to verify user input, and to provide warnings (yellow markers) if values violate a constraint. Since the SPIN constraints are scoped to the class, these tests can be performed very efficiently, only on the instance that the user is currently looking at. In TopBraid Composer, constraints can be incrementally tested after each editing step.

The image above also illustrates another object-oriented aspect of SPIN, namely inheritance: The class Square is a sub-class of Rectangle and thus also inherits the rule that calculates the area from width and height. The same inheritance mechanism applies to constraints, and sub-classes can further constrain or specialize the meaning of its parents.

Now let's look at the meta-modeling capabilities of SPIN, and its closed-world semantics. Previous articles on this blog introduced SPIN Functions and SPIN Templates. These are really powerful mechanisms that allow anyone to create their own modeling vocabulary. SPIN Functions are SPARQL functions that can be used in FILTER or LET statements. SPIN Templates are re-usable SPARQL queries that can be instantiated with parameters. In particular you can use pre-defined templates instead of typing in SPARQL queries by hand. SPIN comes with a library of such pre-defined functions and templates to support common modeling patterns. One of the templates from this library, called spl:Attribute, can be used to link a class with a property, specifying min/max cardinality, value type and default value all in a single place. Here is an example attribute definition from the Rectangle class:

This attribute definition is an instance of the spl:Attribute template, attached to Rectangle via spin:constraint. Its equivalent in OWL would be a collection of three owl:Restrictions attached to the class via rdfs:subClassOf. However, if we look at the definition of the spl:Attribute template, we can see that the semantics are different:

The query above is the body of the spl:Attribute template. When used as a spin:constraint, the system will report constraint violations whenever an instance is found that violates one of the three conditions specified in the query. Among then, is the test for the minimum and maximum number of values: if a Rectangle has less than one or more than one value for width or height, then the system will report an error. If a SPIN template is found, the engine will substitute the occurrences of the argument variables (such as ?predicate and ?minCount) with the provided arguments. In other words, instead of using the template spl:Attribute, the class Rectangle could also have the following constraint, encoded directly as a SPARQL ASK query:

However, such SPARQL queries are complicating and hard to maintain, so that wrapping those queries into re-usable SPIN templates is a far better approach. Again, the principle of object-oriented encapsulation is put to use in SPIN. The user of a template does not need to understand all the low-level details of the underlying SPARQL. Similarly, a user does not need to understand the detailed definitions of SPIN functions (such as spl:objectCount and spl:instanceOf shown above), but instead he or she can simply pass in the arguments and leave the details to the engine. Using the SPIN templates mechanism, even the source code becomes easy to read:

I have uploaded the complete example source code of the spinsquare.n3 file. You will notice that SPIN uses a triple-based RDF representation of SPARQL to overcome the limitations of a pure textual syntax.

To summarize, SPIN applies concepts from object-oriented languages to the Semantic Web. The meaning and behavior of classes can be formalized in a machine-executable way using rules and constraints, and those rules and constraints are scoped locally to aid the execution engine in achieving best performance. Furthermore, SPIN can be used to define new modeling vocabularies, such as attribute definitions with closed-world semantics. If other semantics are needed, then they can be represented as well, as I have shown in a previous posting on expressing OWL 2 RL in SPIN.

The expressive power of SPARQL and its wide adoption in industry means that languages like SPIN could go a long way to extend the reach of Semantic Web technology into areas where other languages fail. SPIN combines a powerful rule language with meta-modeling mechanisms including templates and functions. Instead of limiting ourselves to a single modeling paradigm or language, SPIN makes it possible to create task-specific new languages, or domain-specific services such as unit conversion. All those SPIN-based languages are completely self-describing and thus only require a single unified execution engine. Thinking about this on the Semantic Web scale means that linked data can be published and used in much more powerful ways than currently envisioned.

Friday, January 09, 2009

SPARQLpedia as an Example SPARQLMotion Web Application

TopQuadrant has recently launched SPARQLpedia, a new web service that allows users to share SPARQL queries and to search for queries that others have submitted. The submitted queries are managed on a server-side RDF database together with searchable metadata such as author and submission date. Here is a screenshot of a simple SPARQLpedia web search interface:

Pressing the Search button will display a list of matching database entries:

And the user can click on any of the search results to display details (and execute the selected query):

The underlying services of SPARQLpedia can also be called as (REST-based) web services, as described on the API page.

In a sense, SPARQLpedia is a typical web application:

The application's designers have prepared a database to store the entries
Users can add or delete entries from that database
Users can search for entries in the database by various criteria
A HTML web interface can be used to interact with the database
It can also be accessed programmatically via web services

In this blog article, I will give some details on how SPARQLpedia was implemented and highlight the role of SPARQLMotion as its server-side scripting language.

sparqlpedia.org hosts a standard Apache Tomcat server that runs a TopBraid Live 3.0 beta application. Installing this server was straight forward and basically included dropping the TopBraid Live war file into the tomcat applications folder. TopBraid Live itself is a generic application development framework that makes tons of RDF/OWL related services available through its APIs. There are two server-side APIs in TopBraid Live:

The SPARQLMotion API is an entirely model-driven way of creating web services based on the collection of services wrapped as SPARQLMotion modules.
The TopBraid Live Java API can be used to access and extend the capabilities of the server. In our example, we only needed it to add some new specialized SPARQL functions.

Setting up the Database

The first step of developing SPARQLpedia was to set up a database with an associated base schema. We are using TopBraid Composer for this purpose. The entries in the database are themselves SPARQL queries (but it's easy to translate this to databases hosting product data, academic publications, medical records or whatever). SPARQL queries are entered as strings, but the database stores them in the SPIN RDF Syntax, because this will later allow us to run sophisticated queries on various aspects of the query that would be difficult to achieve if we only had the string representation. So, as a start, we have defined an empty schema ontology that imports the SPIN (sp) namespace. The rest of our database schema is simple: we store Entries that have been submitted by Users, as illustrated in the class diagram below.

The schema is a collection of RDFS classes and RDF properties. The class spedia:Entry only has a single subclass spedia:QueryEntry, but we may want to add additional types of entries later, such as discussion threads or votes. The schema is stored in a file spedia.owl in our Eclipse workspace.

Next we create a persistent database that will contain the submitted instances. For the simplicity, we use the Sesame 2 native Java back-end in TopBraid, but we could have also used any of the other database types supported by TopBraid, including AllegroGraph and Oracle. Our Sesame database imports the spedia.owl from above and its files will be also stored in the workspace. We give it the base URI http://sparqlpedia.org/public, so that we access it later as a named graph from our SPARQLMotion scripts.

Setting up the SPARQLMotion scripts

SPARQLMotion is a visual semantic web scripting language that can be used to build data processing pipelines through a graphical user interface. Typical SPARQLMotion scripts take some input, do some processing and then create some output. TopBraid Composer 3, Maestro Edition is used to build SPARQLMotion scripts, so everything we do (from schema definition and database maintenance to the implementation of the services) is done within a single uniform environment. Let's have a look at an example script, the outline of which is shown in the following screenshot.

This SPARQLMotion script (stored in an OWL file deleteQuery.sms.n3 in the workspace) implements the functionality to delete a query from the repository. The script takes two arguments as input:

The uri identifying the query that shall be deleted
The password of the submitting user - in SPARQLpedia only the original author of a query can also delete it

SPARQLMotion scripts should be laid out (and read) from top to bottom, i.e. you see input coming in from the top, then the input will be processed through a pipeline and finally some results are returned. Each node in the diagram is of a certain module type, and the SPARQLMotion modules library provides a comprehensive list of frequently needed data processing tasks. The deleteQuery script has two exit points, marked by the two red icons at the bottom:

The left end module is used when the script is called via the web service API and just returns the string "OK" as its result.
The right end module is used when the script is called to render an HTML page.

Both end modules have the same type sml:ReturnText but return different mime types. The rest of the script is the same in both cases, i.e. the arguments and the steps to perform the actual deletion are used independently of whether we use the web service API or the HTML call. The web services themselves are declared as subclasses of spin:Functions as shown below:

The class spedia:deleteQueryBase is an "abstract" base class of the two different services and defines the arguments with the usual SPIN function syntax:

The two non-abstract subclasses of deleteQueryBase "inherit" the argument declarations but also point to the SPARQLMotion module in the script that creates the result. For example, the web service deleteQueryHTML has the module ReturnHTML as its return module:

Once the function has been stored in the workspace, it is accessible through a REST URL call, such as http://sparqlpedia.org:8080/tbl/server/tbl/servlet?action=sparqlmotion&id=deleteQuery&uri=...&password=... At development time, we can run the same script within TopBraid Composer ME by hitting localhost:8083 instead. Or we can debug the script manually through the debug button in the graph view of TBC. This has the benefit that we can look at each intermediate step of the script and inspect the state of the triple store and variable bindings with a few mouse clicks.

A Closer Look at the SPARQLMotion Script

Let's walk through the example script from above. The script starts with two Argument modules, which are placed automatically by TopBraid based on the function definition. Technically, these are the same instances of spl:Argument as shown as spin:constraints on the deleteQueryBase class. Here is a screenshot of the form for the password argument:

Each argument can declare a value type that can be used for error checking, and will be used to transform the REST arguments (always strings) into the correct kind of RDF literals or resources. All downstream modules of the SPARQLMotion script can now access the value of the password argument as a SPARQL variable called ?argument. The other argument of the service is accessible as ?uri. The next module Check entry exists is of the type sml:AssertTrue (new in 3.0) and simply verifies that the provided URI is in fact a valid entry in the database:

When this module is executed, the specified ASK query will be executed. If the query returns false, the script will exit with an error message. The error message in constructed from the template given as sml:text, in which {?uri} is substituted with the actual argument value. But wait, which triple store does this query run on? If you scroll up to the script's overview you can see that the Check entry exists module also has the Connect to DB module as one of its predecessors. As usual in SPARQLMotion, the triples represented by the predecessors will be visible in the queries downstream. As shown in the following picture, Connect to DB just opens the Sesame database:

The module above connects to the database via its base URI, so we could later replace it with some other kind of database with the same base URI. Just in case our Sesame DB would explode in size... Ok, by now we have reached the stage where we have verified that the provided ?uri is in fact a valid instance of spedia:QueryEntry in our database. Next, let's validate the password. Another sml:AssertTrue module is used for that purpose as shown below.

They key aspect of the Check password module is an ASK query that gets the spedia:User object that has submitted the query with the given ?uri and then checks whether this user has the provided ?password. It throws an error if the password in the database does not match. Once all those tests are passed, the script can go on with the actual delete. The Delete entry module is an instance of sml:PerformUpdate that runs a SPARQL update call deleting all triples that have the given ?uri as their subject. Now that the query is gone the script forks, depending on how it was called. Assuming the script was called to return an HTML page, it continues with the right branch and the module Return delete query result will be ignored. The resulting HTML page will look like the following:

In order to produce this HTML page, the end result module uses a template string with the basic HTML outline. Only the URI string is different each time, the rest is static. The Return HTML module looks like the following.

The HTML page itself is encoded as a template into which {?uriString} and {?footer} will be inserted at execution time. An alternative way of creating HTML pages from a template is via the Semantic JSP support in TopBraid and SPARQLMotion (not shown here).

The outcome is then sent back to the client using text/html mime format. The footer is re-used in several HTML pages in our system, so we import it from a file:

To complete the script, a small detail is that we want to display the ?uri as a full string and therefore insert a string conversion module before we insert it into the template:

That's it! The script is now finished and ready to be used, assuming the deleteQuery.sms.n3 file has been uploaded to the TopBraid Liver server's workspace.

The other services are also implemented using SPARQLMotion scripts:

submitQuery takes a query string, comment, user name and password as arguments and uses SPARQL INSERT queries to insert those as QueryEntries into the database.
findQueries takes a namespace, resource URIs or a user name as arguments and then runs a SPARQL CONSTRUCT query to create an RDF response. This response is optionally rendered into a HTML table. To gain best performance, the SPARQL query string is assembled dynamically based on the input from various clause templates.
renderQuery takes a query URI and creates a pretty HTML page from it.

Summary

This example shows how the TopBraid Live platform and its SPARQLMotion support can be used to implement scalable public web services and HTML-based internet applications. SPARQLMotion can be used to define almost arbitrary REST-based web services. Deployment of those services together with the ontologies and triple stores they operate on is fairly simple. The scripts and the ontologies can be defined and tested using TopBraid Composer ME.

Please be aware that the approach shown here covers just one aspect of the TopBraid Live platform. Another approach for developing user interfaces is via TopBraid Ensemble. Version 3.0 (coming soon) is a complete framework for building rich Flex-based interfaces from configurable components. More on this some other day...

Thursday, January 08, 2009

Understanding SPIN Templates

The SPIN Modeling Vocabulary defines a mechanism to encapsulate SPARQL queries so that they can be reused in different contexts: SPIN Templates. A SPIN Template is basically a canned SPARQL query that is parameterized with arguments. In this blog entry, I will walk through an example SPIN template to explain how templates are defined and used. In the end of this article I will also boldly explain why I believe that SPIN is potentially one of the missing links in the Semantic Web puzzle, and a disruptive killer technology.

If you haven't done so yet, please read the previous posting about Understanding SPIN Functions. SPIN Functions and Templates basically use the same mechanisms - the main difference is that templates are more general.

We will again use the kennedysSPIN ontology that is shipped with TopBraid Composer 3.0. This example ontology defines various family relationships, including parent, grandFather and grandMother. The example uses a template called InferGrandParent to derive the values of the two grand parent properties from the values of parent and gender.

Let's first have a look at how these relationships could be inferred using plain SPARQL queries. In the following screenshots, we have attached two SPARQL CONSTRUCT queries as spin:rules to the class Person.

The upper rule computes the values of grandFather while the lower rule infers the values of grandMother. We can see that both rules are almost identical: both queries walk up the parent relationship two steps and then check the gender of the grand parent. The only difference are the pairs grandFather/male and grandMother/female, but the basic structure of the queries is the same. SPIN templates can be used to generalize such query patterns so that they can be reused in a more maintainable way. In our example, we will introduce a template that allows us to replace the two individual SPARQL queries, so that the rules look will like in the following screenshot.

In order to get there, we first introduce a template called InferGrandParent, the definition of which is shown below.

A SPIN template is a class that has the metaclass spin:Template as its type. I recommend creating a subclass of the system class spin:Templates to keep your templates organized in the class tree. A template should have a comment describing what the template does and can have any number of arguments. These arguments are similar to the arguments of SPIN Functions described earlier; the main difference is that the arguments of templates are unordered and can point to any property instead of being limited to sp:arg1, sp:arg2 etc.

In our example template, we need two arguments:

the gender that we are matching against (male or female)
the predicate that we want to infer (grandFather or grandMother)

The values of those arguments will be inserted into the body query of the template at execution time. Let's take the rule to infer the grandFather relationship as an example. The instantiated template is an instance of the template class in which the values for gender and predicate are filled in:

When executed as a SPIN rule, the template call above will be substituted with the body of the template, so that we get:

So instantiating the template has the effect that the variables ?predicate and ?gender have been replaced with the arguments; in this case: grandFather and male. The same template can now be reused for the pair grandMother and female.

To summarize, SPIN templates are a powerful mechanism to put SPARQL queries into a (black) box so that you or others can reuse a piece of behavior, even without having to worry about the specific details of the query. Instead, they can use a simple form-based interface to "fill in the blanks" and let the system do the low-level SPARQLing.

Taking this idea further, SPIN templates are a mechanism to create arbitrary new ontology modeling languages. For example, I have shown how you can create templates that encapsulate unit conversion expert knowledge. Another example was the re-definition of OWL 2 RL using SPARQL, although the latter did not really use template arguments because all rules were global. The SPIN Standard Modules Library is another example of very generic templates for tasks such as cardinality constraint checking. But these examples may be just the tip of the iceberg. There could be a marketplace for SPIN template libraries consisting of domain-specific modeling languages with executable semantics.

Instead of hard-coding a pre-defined collection of modeling constructs such as OWL with its system vocabulary owl:Restriction, owl:FunctionalProperty etc, SPIN is an extensible framework for building your own modeling language. There is no need for SPIN-aware tools to hard-code the semantics of any specific modeling language - all they need to understand is how to execute SPIN templates, and the rest is completely driven by whatever is encoded in the particular ontology. SPIN users are not restricted by the vocabulary that any particular W3C committee has selected for them. Instead the democracy of the Semantic Web user's community will select the ideal set of those constructs that are really needed in practice.

Understanding SPIN Functions

We have recently introduced SPIN, a light-weight vocabulary that enables the use of SPARQL to define constraints and inference rules for semantic web models. One very powerful facet of SPIN is that it can be used to define new SPARQL FILTER and LET functions without writing a single line of programming code. The goal of this blog entry is to explain the mechanisms of user-defined SPIN functions using an example from the kennedysSPIN ontology that is shipped with TopBraid Composer 3.0.

Let's look at an example function called getFather that returns the father of a given person:

Each SPIN function is an instance of the metaclass spin:Function. The best way to create a new function is to create a subclass of spin:Functions, a system class that groups together all available functions in the class hierarchy. The name of the function defines the URI under which is will be accessible from SPARQL queries. The function getFather is in a file with the default namespace, i.e. no prefix is needed to call it. The following example shows a function call of :getFather() in the LET assignment.

You can see that the getFather function takes one argument. Each SPIN function must formally declare its arguments. For each argument, the function class must have a spin:constraint that points to an spl:Argument object. For each argument of the function, the spl:Argument specifies:

a comment describing the argument
the name and index (e.g., sp:arg1 for the first argument)
the value type (such as xsd:string)
whether it is optional or not
a default value in case it is optional

The easiest way of creating such an Argument definition in TopBraid is via drag and drop: locate sp:arg1 in the Properties View and then drag it over the spin:constraint label to instantiate a spl:Argument template. Here is the definition of the first (and only) argument of getFather in TopBraid:

The fact that those arguments are attached using spin:constraint might be confusing at first. You can ignore this and simply treat it as a convention that SPIN uses to look up which arguments are defined for a function. A technical explanation is that function calls are instances of the function class, and that those instances must fulfill the value type constraint encoded in the spl:Argument. spl:Argument is a SPIN template that encodes this value type check by means of another SPARQL query. But again, you can ignore this aspect.

But now let's have a look at the key aspect of a SPIN function definition: the function's body. The property spin:body links a function class with a SPARQL query. This is the query that will be executed whenever the function is called. The SPARQL query must be either an ASK query or a SELECT query.

If the body is an ASK query, then the function's result is true or false
If the body is a SELECT query, then the function's result is the first variable binding of the result variable in the SELECT clause. All other values will be ignored. Null will be returned if no binding exists.

In the getFather function, the body is a SELECT function, because we are interested in a specific instance of person, bound to the result variable ?father:

As you can see in the screenshot above, the body query refers to the variable ?arg1 and TopBraid Composer displays those variables in bold face. At execution time, ?arg1 will already have a value pre-assigned to it, namely the function's argument. In the following function call, ?arg1 is bound to the instance kennedys:JohnKennedy.

Now, when the SPIN engine executes the getFather function, it will pre-bind the ?arg1 argument so that it becomes:

So to summarize, SPIN functions are wrapped SPARQL queries that contain references to argument variables such as ?arg1 and ?arg2. These argument variables will be pre-assigned to the actual arguments at execution time.

Once you have defined your function, you can store them in a file ending with .spin., e.g. myFunctions.spin.n3 and you will be able to use them anywhere in TopBraid without even importing the function's file. This allows anyone to create and share libraries of functions, for example to do generic things such as unit conversion.

These very same mechanisms are also used to define SPIN templates, but this is another topic...

Wednesday, January 07, 2009

SPARQLpedia: Sharing Semantic Web queries on the Semantic Web

Together with the TopBraid Composer 3.0 beta, TopQuadrant is also launching a new web service called SPARQLpedia. This is a free, public service that hosts SPARQL queries in a searchable repository. Anyone can submit new SPARQL queries. Anyone can search the repository for queries that mention a given namespace, certain resource URIs or have been submitted by certain users. All submitted queries that mention a SPARQL endpoint (FROM clause) can be conveniently executed online with a Run Query button.

The SPARQLpedia repository can be accessed in three ways:

as a (REST-based) web service from any application
via an HTML web interface at http://sparqlpedia.org
using submit and query buttons in TopBraid Composer

Here is a screenshot of the web interface displaying an example query from the repository:

Here is a screenshot of the new SPARQLpedia search view of TopBraid Composer. It shows how users can conveniently store a library of their favorite queries and use the queries from this library with a few mouse clicks.

There are several motivations for us to implement this service. First, I believe that many SPARQL queries are reusable in one way or another - they may demonstrate design patterns or ask common questions. There is a lot of useful knowledge encoded in queries. In the spirit of modern web applications, SPARQLpedia provides a mechanism to share and reuse queries on a global scale.

Second, SPARQLpedia serves as an example application to demonstrate how our TopBraid Live platform and SPARQLMotion can be used to implement RDF-based services. I will provide details on the implementation of SPARQLpedia and its use of SPIN and SPARQLMotion in a follow up posting. Since SPARQLpedia is implemented using SPARQLMotion, any company with a TopBraid Live license can also host their local SPARQL repositories inside of their firewalls to share frequently used queries among team members.

Tuesday, January 06, 2009

SPIN Box: A SPARQL-based Computer Game Engine

Here is another example video (5 minutes) illustrating what you can do with SPIN. TopBraid Composer 3.0 beta comes with a SPARQL-based computer game engine called SPIN Box. This engine allows ontology developers to create new kinds of computer games without having to hard-code anything in a programming language like Java. Instead, the engine runs SPARQL CONSTRUCT rules (using SPIN) to determine the behavior of each field in the game.

The game may appear silly but the underlying message is a powerful one: the game demonstrates how Semantic Web standards and tools can be used for model-driven application development. The SPIN framework can be used to create domain models that have executable behavior attached to them. Being based on RDF and SPARQL, SPIN files can be shared online and re-purposed for different use cases. For example, anyone can extend the behavior of a computer game by introducing new types of objects. Theoretically such applications can even use other (Semantic) Web resources or background knowledge to drive their behavior.

While traditional applications operate on closed worlds, the Semantic Web has been designed to be open and linkable. Frameworks such as SPIN open an application's architecture and can provide an unprecedented level of dynamic behavior and flexibility. The box is now open.

Monday, January 05, 2009

Video: SPARQL-based Unit Conversion with SPIN

Here is a video on another example application of SPIN. This time we are using SPARQL rules to convert values between units of measurement. In the example we convert from centimeters to feet and cubic meters to cubic feet. Two solutions are presented: The first is using SPARQL CONSTRUCTs that have the conversion factors encoded as part of the spin:rules. The second solution is much more generic and uses a NASA units ontology (developed by TopQuadrant) to dynamically find the appropriate conversion factors. For this second solution, the video also shows how to define new SPARQL functions using SPIN, and how to use SPIN templates to encapsulate reusable queries. The video is a step-by-step tutorial and takes about 16 minutes to complete. If you haven't done so yet, you should have a look at the TopBraid Composer SPIN page to get some background and screenshots about SPIN functions and templates.

I believe this is a fine example of what can be achieved if Semantic Web technology is applied the way that it should: to share data models and knowledge together with executable semantics. SPIN might be a missing link in the Semantic Web stack, because it allows users to create and share domain-specific modeling languages that encapsulate all the background needed to build useful applications out of them.

OWL 2 RL in SPARQL using SPIN

The evolving OWL 2 standard comes with a profile called OWL RL. According to the OWL 2 RL W3C page "the OWL 2 RL profile is aimed at applications that require scalable reasoning without sacrificing too much expressive power. It is designed to accommodate both OWL 2 applications that can trade the full expressivity of the language for efficiency, and RDF(S) applications that need some added expressivity from OWL 2. This is achieved by defining a syntactic subset of OWL 2 which is amenable to implementation using rule-based technologies".

This means that there will soon be a well-defined specification that expresses the semantics of a good subset of OWL in a format that can be handled by rule engines. Many OWL implementations (such as OWLIM or Jena) have already used such a rule-based approach for ages and in many cases their performance is much better than with tableaux-based OWL implementations.

Based on its CONSTRUCT keyword, SPARQL can also be considered to be a rule language. SPIN is a new SPARQL-based vocabulary that we have recently introduced with TopBraid Composer 3.0 beta. SPIN can be used to encapsulate reusable SPARQL queries as templates. These templates can then be instantiated in any RDF or OWL ontology to add inference rules and constraint checks.

Mostly as an exercise and a proof-of-concept, I have converted the OWL RL rules into SPIN templates. The SPIN library at http://topbraid.org/spin/owlrl now contains the complete OWL 2 RL specification in executable form, formalized in SPARQL CONSTRUCT rules.

The following screen shows the example OWL RL rule cax-eq2 encoded as a SPIN template:

The example above implements the rule that if c1 has been declared to be owl:equivalentClass of c2 and x is an instance of c2, then x is also an instance of c1. In order to activate this type of inferencing in your model, you just need to instantiate this template as a spin:rule at owl:Thing. Or simply import the file http://topbraid.org/spin/owlrl-all, which activates all rules for all instances of owl:Thing.

This mechanism can not only be used to fine-tune the inferences for a specific model, but also to enhance the expressivity of other inference engines. TopBraid Composer allows users to combine inference engines, e.g. to run SPARQL rules on top of Jena inferencing. By activating a couple of rules from the OWL RL library, you can use some new OWL 2 keywords such as owl:key or owl:propertyChain in your model. And you can do all this yourself - just write your own SPIN rules to implement your own domain-specific modeling language and then share them with the others on the Semantic Web!

Note: the OWL RL SPIN library above is basically untested but I would appreciate bug reports or other suggestions on how to improve it. The conversion of most rules was straight-forward, but there were a handful of rules that were tricky to convert to SPARQL, especially those with "for-all" semantics. In one case (owl:key) I had to introduce a user-defined SPIN function to negate a list traversal. In some other cases I had to rely on built-in Jena functions such as list:member. For property chains, I only implemented chains with the length of two, e.g. the infamous "uncle" relationship. However, other lengths can easily be added if they ever become relevant in practice.

The following example uses the OWL RL property chain rule to infer the uncle relationship between two instances. The source code of the corresponding OWL file (in N3 is below, updated for the latest OWL 2 version from June 11, 2009).


@prefix spin:    <http://spinrdf.org/spin#> .
@prefix xsd:     <http://www.w3.org/2001/xmlschema#> .
@prefix owlrl:   <http://topbraid.org/spin/owlrl#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .
@prefix :        <http://topbraid.org/spin/owlrl-test#> .

<http://topbraid.org/spin/owlrl-test>
 a       owl:Ontology ;
 owl:imports  ;
 owl:versionInfo "Created with TopBraid Composer"^^xsd:string .

# Instantiate SPIN template with property chain semantics
owl:Thing
 spin:rule
         [ a       owlrl:prp-spo2-2
         ] .

:Person
 a       owl:Class ;
 rdfs:subClassOf owl:Thing .

:Darwin
 a       :Person ;
 :parent :Holger .

:Holger
 a       :Person ;
 :brother :Thorsten .

:Thorsten
 a       :Person .

:brother
 a       owl:ObjectProperty .

:parent
 a       owl:ObjectProperty .

:uncle
 a       owl:ObjectProperty ;
 owl:propertyChainAxiom (:parent :brother) .

# defines the new OWL 2 property if needed
owl:propertyChainAxiom
 a       rdf:Property ;
 rdfs:label "property chain axiom"^^xsd:string ;
 rdfs:range rdf:List .

Friday, January 02, 2009

Video: SPARQL-based inferencing and constraint checking with SPIN

Here is a short and sweet 3 minutes video demonstrating how to use SPIN to execute inferences and to do constraint checking on the Kennedy example ontology. The content of the video is similar to what is explained (with screenshots) on the TopBraid SPIN page. The example files are part of the TopBraid Composer 3.0 download, i.e. you can replay the scenario on your own machine.

Introducing SPIN: the SPARQL Inferencing Notation

With this week's release of TopBraid Composer 3.0 beta1, TopQuadrant is adding new lego bricks to the Semantic Web stack. SPIN is a collection of RDF vocabularies enabling the use of SPARQL to define constraints and inference rules on Semantic Web models. Let me give you some (technical) background on why I believe SPIN will be useful. Future postings will elaborate on use cases and example applications.

One of the main selling points of Semantic Web technology is the ability to publish domain models with executable semantics. Most Semantic Web models contain class and property definitions together with definitions of ranges, domains, OWL restrictions, OWL property types, SWRL rules etc. These formal definitions can be used by any tool that implements the underlying languages to operate on the model even if the tool does not have any hard-coded knowledge about the domain. So if I publish a Semantic Web model stating that all instances of Person can have string values for firstName then Semantic Web tools can build suitable input forms to collect instances. Or, if I include a rule that states that the age of a Person is the current date minus his or her birth day, then any Semantic Web tool can automatically compute the value of age just by executing the rules. Again, nothing needs to be hard-coded and the tool can dynamically discover what a given model is all about. This is also the foundation for various data integration and information discovery tasks.

RDF and RDF Schema only provide very limited expressivity for such definitions, and it has been (intentionally) left to higher-level languages such as OWL to provide richer modeling constructs. However, people quickly recognized that in practice OWL does not meet all requirements and use cases, so that additional languages like SWRL (and recently RIF) have been proposed. These are rule-based languages that contain constructs for IF-THEN conditions which infer new triples when a pre-condition is met in the current state of the model. These rule languages cover very important use cases and many practitioners find them quite natural to use.

Now let's get back to the use cases of rich Semantic Web languages. Typically, people use them for two different purposes:

constraint checking: test whether the model is in a consistent/expected state
deriving new values: compute implicit property values from what's stated in a model

The focus of OWL is on the latter aspect although many people seem to misunderstand its semantics or intentionally simply ignore the open-world assumption and the lack of unique name assumption to use it for constraint checking as well. But this is actually incorrect, and this misuse of OWL for these tasks indicates that other languages are required to fill in this gap.

But the quest for good modeling languages does not have to stop at OWL or SWRL - there is another well-known language in the Semantic Web space that can be used to formalize semantics: SPARQL. SPARQL is a firmly established W3C standard query language and implemented by all major Semantic Web stores on the market. SPARQL is very expressive as it provides means to define matches against almost arbitrary RDF graph patterns in the WHERE clauses. Also, many Semantic Web practitioners are already familiar with SPARQL and various query editing tools exist. Furthermore, SPARQL seems to meet the users' expectations very well with regards to things like the open-world assumption: SPARQL queries only operate on the triples mentioned in the WHERE clause - no other implicit assumptions are used at query execution time. You get what you see.

Most people know that SPARQL has the SELECT query form, but there is also the extremely useful CONSTRUCT keyword and the simple ASK keyword. The SPIN Modeling Vocabulary makes heavy use of the latter two keywords. To simplify a bit, SPIN suggests to use

ASK for constraint checking, and
CONSTRUCT for deriving new values

So, a SPIN-based ontology is a collection of classes and properties plus ASK and CONSTRUCT queries. The question then is: how can we connect those queries to the domain models? How can we store the queries together with the model in a seamless way?

In previous incarnations of SPIN (when it was not called SPIN yet), TopBraid had simply stored the SPARQL queries as strings as part of the domain model. We had used a dedicated property called sparql:query that would point from any RDF resource to a SPARQL string. This approach was of course fairly weak. Relying on a purely textual representation is error-prone, for example when someone renamed a resource the change must also be made to the query string. Also, what about the namespace prefixes used in SPARQL queries.

In order to provide a maintainable representation of SPARQL queries, SPIN defines an RDF vocabulary for storing SPARQL queries. Instead of storing an ASK query as a string, SPIN stores it as an instance of a dedicated RDF class sp:Ask etc. For example, the SPARQL query

    ASK WHERE {
   ?this my:age ?age .
   FILTER (?age < 18) .
  }

can be represented in SPIN RDF syntax in N3 format as

    [ a       sp:Ask ;
               sp:where ([ sp:object sp:_age ;
                           sp:predicate my:age ;
                           sp:subject spin:_this
                         ] [ a       sp:Filter ;
                           sp:expression
                                   [ sp:arg1 sp:_age ;
                                     sp:arg2 18 ;
                                     a sp:lt
                                   ]
                         ])
             ]

This may remind some of you of SWRL, where Semantic Web rules are also triplified or OWL class expressions that look similarly complex in the RDF. The RDF syntax is not necessarily pretty but it's intended to be used by software, not humans. In the case of SPIN, editing tools like TopBraid display these constructs in human-readable SPARQL syntax on the screen. Furthermore, there is a free public web service for converting between the two SPARQL syntaxes.

But the main achievement here is that we are now able to store SPARQL expressions as part of our Semantic Web models, and use SPARQL's rich expressivity to describe the concepts from our domain. The next question is: where do we put those SPARQL expressions? In SWRL, the inference rules have global scope that are simply placed anywhere in the model (as instances of swrl:Imp). In OWL, a frame-based approach is used where logical descriptions are attached to classes using rdfs:subClassOf or owl:equivalentClass. The latter has the advantage of providing some context and scope to the rules, i.e. the ontology designer consciously attaches the pieces of domain knowledge to the classes or properties where they belong to. Inheritance similar to that from object-oriented modeling is used to re-use and specialize those definitions.

SPIN supports both approaches, i.e. rules and constraints can be either global or scoped in the context of a given class. The recommended approach is to attach SPIN declarations to classes, following object-oriented design. Similar to object-oriented languages like Java, there is a special variable called ?this which refers to the current instance. For example, assume you attach a rule that computes the age of a person from its birth day to the class my:Person. If my:Parent is a subclass of my:Person, then the rule will also be applied to all instances of my:Parent. At execution time, the variable ?this will be bound to the current instance (of either my:Person or my:Parent). Global rules are simply attached to the root classes rdfs:Resource or owl:Thing and do not mention ?this.

The result of this mechanism is that SPIN users can exploit the whole range of SPARQL features to make their domain models executable, even on the scale of the Semantic Web. The following postings will provide some examples on how to use SPIN as a rule and constraint language. To summarize where we are so far, SPIN is a very light-weight mechanism that leverages SPARQL for new application areas that go far beyond querying. But there are additional capabilities in SPIN, for user-defined functions and query templates which I will introduce in future postings as well. Please stay tuned...

Composing the Semantic Web