Tuesday, April 26, 2011

Faceted Search with TopBraid and SWP

Many Semantic Technology companies offer some kind of faceted browsing tool. With TopBraid 3.5 it was time for TopQuadrant to say "me too", and add some unique capabilities into the mix.

The main idea of faceted browsing is to allow users to narrow down a set of objects by selecting properties that the sought-after objects must possess. For example, if you search for people in the infamous Kennedy ontology, you may want to find all instances of Person that went to the same university and share the same profession. TopBraid's faceted search component follows a user interface paradigm made popular by FreeBase Parallax: you start with a set of all Persons and the system will compute how many matches are in each category. Clicking on a category will narrow down the set, and you can add the next condition. The following screenshot illustrates this, with the facet "alma mater" narrowed down to "Harvard University".

TopBraid's faceted search is implemented by a collection of TopBraid Live servlets and a JavaScript UI library. What you see on the screen above is in fact a web browser embedded into TopBraid Composer. The default stylesheet is simple and can be customized, and it's also possible to use the same JavaScript library in completely different web applications.

One distinguishing capability of TopBraid's Faceted Search support is its customizability. SPARQL Web Pages (SWP) technology can be used to customize the visual appearance of the preview results on the right hand side. A key benefit of SWP is the linkage between ontologies and user interface snippets. Basically, SWP allows you to attach HTML snippets to any RDFS or OWL class in your domain model using the property ui:instanceView, and the system is then able to dynamically select the best suitable visualization for any object that it gets. For example, the visualization for kennedys:Person can be changed as shown below.

The faceted search component looks for visualizations marked with ui:id="facetSummary", and will display them as shown below.

Further customizations are possible without any programming: For example you can specify which properties shall be visible by default, and which properties shall not be selectable as facets.

There is more to be said about this new capability. But if you just want to get started, use TBC-ME 3.5, select the class that you want to search instances of, switch to the Browser tab and pick the facet.ui:SearchView view in the drop down list. Note that on Windows this currently does not work because Eclipse includes an outdated internal web browser. Please use the button Open current page in external browser. Like with any new feature, we appreciate your feedback.

TopBraid Composer's Web Data Basket: Collecting Linked Data while you browse

One of the little new features in TopBraid Composer 3.5 is the Web Data Basket view. This can be used to incrementally download Linked Data (either RDFa or RDF) while browsing the web. The best way to experience this is by getting a small TBC Firefox extension. This will add a tiny TopBraid button to the lower right corner of your browser.
Click on this button while TopBraid Composer is executing, and all RDF data encoded on the currently visited page will be added to TBC's Web Data Basket:

While this Basket displays the raw triples, it also has options to add the loaded triples into the current model. For example, you will get a proper foaf:Person for David Bowie if you visit his DBpedia page:

In order to facilitate the use of this data, TopBraid Composer will automatically add missing imports to namespaces such as foaf and skos. When you follow a hyperlink in your web browser, the basket will get more content. This means that the system will accumulate any Linked Data into TopBraid as you navigate through the web.

This little Web Data Basket makes it easy to collect Linked Data without having to leave your favorite tools. I think it provides a fine example of how Linked Data could be used, e.g. to build up a shopping list of products backed with GoodRelations data.

Thursday, April 21, 2011

SPINMap: SPARQL-based Ontology Mapping with a Graphical Notation

One of the new features in the upcoming TopBraid 3.5 release is called SPINMap. SPINMap is a SPARQL-based language to represent mappings between RDF/OWL ontologies. These mappings can be used to transform instances of source classes into instances of target classes. This is a very common requirement to create Linked Data, for example starting with spreadsheets, XML files or databases, but also from one domain-specific ontology into a more generic one. As a first impression, here is a picture of SPINMap in action:

If you would like to learn about this with a visual demo, please take a look at the

In the rest of this blog entry I will cover similar content to the video, but with screenshots and prose.

Introduction to SPINMap

SPARQL is a rich language that can be used for many purposes. The SPARQL CONSTRUCT keyword is particularly useful to define rules that map from one graph pattern (in the WHERE clause) to another graph pattern. This makes it possible to define sophisticated rules that map instances from one class to instances of another one.

The SPIN framework provides several mechanisms that make the definition of such SPARQL-based mapping rules easier. In particular, SPIN makes it easy to associate mapping rules with classes, and SPIN templates and functions can be exploited to define reusable building blocks for typical modeling patterns.

The SPINMap vocabulary (http://spinrdf.org/spinmap) is a collection of reusable design patterns that reflects typical best practices in ontology mapping. SPINMap models can be executed in conjunction with other SPARQL rules with any SPIN engine. The main advantage of SPINMap is that it provides a higher-level language that is suitable to be edited graphically. TopBraid Composer 3.5 provides a visual editor that makes it easy to establish ontology mappings using drag and drop, and filling in forms.

It is a good practice to store the ontology mapping rules in files separate from the source and target files. The mapping file only needs to import the SPINMap namespace (which in turn imports SPIN etc). The easiest way to get started is to use File > New > RDF/OWL/SPIN File... and then to activate the check box for "SPINMap Ontology Mapping Vocabulary", as shown below.
This will create an empty file importing http://topbraid.org/spin/spinmapl. As a next step, you should drag the source and target ontologies into the Imports view so that those get imported into the mapping ontology. Then select the class you want to start mapping, and switch to the Diagram tab. In the example below, the source ontology A defines a class a:Person, and we want to map it into the target class b:Customer.

Use drag and drop (e.g. from the Classes view) to add other classes to the Diagram. If the SPINMap namespace is present, the Diagram will provide additional capabilities and use a different layout algorithm than usual. If you move the mouse over a class, a triangular anchor point will appear in the upper right corner of the class box. It will turn green if you move the mouse over it, and if it can be made the source of a mapping. Click on this and keep the mouse button pressed to establish a link to another class. Move the mouse over the incoming upper anchor of the target class and release the mouse. A dialog like the one below will appear.

This dialog is used to create a "mapping context" that is later used to determine how the target instances shall be selected from the source instances. In particular this is used to construct URIs from the values of a given resource, e.g. so that a:Instance-0-1 is turned into b:John-Smith. The dialog provides a collection of target functions that can be used for that purpose. You simply need to pick an appropriate function and fill in the blanks to establish a mapping context. In the example screenshot, a new URI is constructed from the values of the source properties a:firstName and a:lastName and a provided URI template. This assumes that those properties together serve as unique identifiers, similar to primary keys in a database. Other algorithms can be created if needed through SPIN functions.

As soon as you have filled in all required arguments of the mapping context function, the preview panel of the dialog will give you an idea of how the resulting values will look like. When you are happy with this, press OK.

The resulting context will be displayed with a yellow graph node as shown below.

If you ever need to edit this context node again, e.g. to change the URI template, just double-click on it. Right-clicking the node opens a context menu with an option to delete it.

Once a context has been established between two classes, the user interface makes it possible to add transformations. In the example above, the source class has a property a:dob that holds date of birth values as raw strings, such as "30/04/1985". We want to map this into the target property b:birthDate, which is a well-formed xsd:date in the format "1985-04-30". TopBraid's SPARQL library provides a built-in function spif:parseDate to make this task easier. Use the mouse to draw a connection from a:dob to b:birthDate. A dialog such as the following will appear.

In this dialog you can either manually select a transformation function, or check if the system has any suggestions for you, on the Suggestions tab. In this case, the system suggests spif:parseDate with pre-defined patterns to convert raw dates into valid xsd:date literals. Pressing OK, this creates a mapping transformation as shown below.

At any point in time, TopBraid Composer makes it easy to try the mapping out. Assuming TopSPIN is the selected inference engine, just press the Run Inferences button in the main tool bar to see the results.

As you can see above, each instance of the a:Person class has been mapped into a corresponding instance of b:Customer. The URI of the target resources has been generated using the string insertion template based on first name and last name. Furthermore, proper birth dates have been generated from the raw source strings. The context menu of the Inferences view provides options to assert the resulting RDF triples if desired, or you can use the Triples View to move them elsewhere.

It is possible to add any number of other transformations in similar ways. Some transformations take more than one argument. In that case, additional input anchor points will be displayed, as shown for the node "concat with separator" below.

Note that a complex example like above uses a number of different design patterns. Some additional of those patterns are explained in the tutorial video, that I would strongly recommend if you want to save time with this technology.

Understanding and Extending SPINMap

The mini tutorial above might be enough for many users to get started. For advanced users with knowledge of SPIN, the following background may be helpful to understand how SPINMap works, and how it can be extended.

SPINMap is an entirely declarative application of SPIN. This means you can explore the mappings generated by the visual editor from an RDF perspective, e.g. using TBC forms. In the example above, the form for a:Person displays a collection of SPIN Template calls:

You can drill into the templates by opening up the + sign that appears when you hover the mouse over the template icon.

The example above illustrates that SPINMap is based on a (small) collection of generic templates, such as spinmap:Mapping-2-1 which represents a mapping from 2 source properties into 1 target property. Each of those templates a linked to a spinmap:Context which is used at execution time to determine the target URIs. Furthermore, the argument spinmap:expression points to a SPARQL expression, SELECT or ASK query, or even a constant URI or literal that is used to compute the target value from the source value(s). The SPINMap templates are using the function spin:evalto evaluate those expressions at execution time. When executed, the expression will be invoked with pre-assigned values for ?arg1, ?arg2 etc, based on the current values of spinmap:sourcePredicate1 on the source instances.

Since in practice any SPARQL function can be used as spinmap:expression, users can also add their own SPIN functions where appropriate. It is also possible to use the built-in SPARQL functions such as xsd:string().
The mapping context uses a similar mechanism, also based on spin:eval to create target URIs. You can open any instance of spinmap:Context to see how this is done.

In the example above, the target function spinmapl:buildURI2 is used to derive a new URI from two input properties and a template. You are free to define your own target functions there, as long as they are instances of spinmap:TargetFunction (and subclass of spinmap:TargetFunctions).

If you are writing your own functions, or want to make the system smarter, you can add your own spinmap:suggestionXY values to the functions. These are SPARQL CONSTRUCT queries that may construct zero or more instances of the function, with partially filled in fields, as well as a spinmap:suggestionScore. See the function spif:parseDate for an example of what can be done with this mechanism.

Monday, April 04, 2011

SPIN is a W3C Member Submission

The SPARQL Rules language SPIN has evolved over the last couple of years as an integral part of TopQuadrant's TopBraid Suite. SPIN started during a discussion between Dean Allemang and myself, in which we brainstormed about having an RDF syntax for SPARQL. I went ahead and implemented this based on Jena's ARQ API, and the result eventually became the SPIN RDF Syntax. This was no rocket science, because similar ideas of representing higher level languages by means of RDF blank node structures had been explored by OWL and SWRL.

Prior to our work on SPIN, we had already experimented with various mechanisms to link SPARQL queries with RDF data structures, so that they could be shared as query libraries. TopBraid veterans may remember the sparql:query property that was introduced to store SPARQL queries (as strings) together with RDF models. So while I was working on the SPIN RDF Syntax, I noticed that we now have a much better way of achieving this goal. A quick cross-reference to object-oriented languages led to me select properties such as spin:rule and spin:constraint to point from a class to a SPARQL query, expressed in RDF. This later became the SPIN Modeling Vocabulary.

Once I had the rules and constraint mechanism in place, I noticed that many rules and constraints were following similar patterns, with just one or two values different in each rule. This led to the creation of SPIN Templates. Templates then became the foundation of user-defined SPIN Functions. With those two pieces in place, SPIN suddenly became a language that was fundamentally different (and better) than what similar languages such as SWRL provided, because it became possible for users to define their own modeling vocabulary, and even extend the expressivity of SPARQL.

The first version of SPIN was published as part of TopBraid Composer in January 2009. Since then, it was positively received by our user community and practical use cases have enabled us to fine tune and extend the language over the years. Now, around three years after its first experimental versions, we found the time was right to officially share SPIN with the broader community, and make clear that it is not a proprietary TopQuadrant technology. Together with James Hendler and Kingsley Idehen, we put together a SPIN W3C Member Submission that has just been published on the W3C site.

The status of a Member Submission means that TopQuadrant encourages other tool vendors to also provide SPIN implementations, and as I have heard there is work in progress already. The Member Submission also indicates that SPIN may play a role as input to future revisions of other standards such as RIF. This is all very good. Of course a full spec of SPIN as an official W3C standard would be even better, but going through the whole standardization process is a long and difficult journey. Given that SWRL had become a similar de-facto standard with Member Submission status alone indicates to me that SPIN has good chances of achieving the same. In fact I strongly believe that the fact that SPIN is based on SPARQL will be crucial in winning the hearts and minds of many Semantic Web and Linked Data enthusiasts. SPIN can co-exist with other languages including OWL 2 RL and SKOS. SPIN doesn't require any special execution engine apart from a SPARQL store. The learning curve is very low for anyone who already knows SPARQL. SPIN is part of the Semantic Web technology stack.

A good place to start learning SPIN is the TopBraid SPIN page, with screenshots and links to a tutorial. For programmers, there is an open source SPIN API available.