Tuesday, July 05, 2011

Validating schema.org Microdata with SPIN

The new 3.5.1 version of TopBraid Composer introduces some initial features to import, browse, edit and analyze Microdata. I wrote about this in a previous blog entry - if you want to try those features just download TBC's evaluation version, keeping in mind that Microdata support is still at an early stage and that, for example, the parser isn't fast yet. Today I will focus on a SPARQL-based approach for validating schema.org Microdata using SPIN inside of TopBraid.

I have published a library of SPIN constraints at http://topbraid.org/spin/schemaspin. This library currently includes 11 types of integrity constraints covering various aspects on the schema.org ontology, as shown below.

The most basic tests are making sure that schema.org properties can only be used at classes with matching domains, and that the values of those properties match their declared ranges. These tests alone may help you identify many potential errors at edit time. Another generic test is associated with owl:IrreflexiveProperty, making sure that a value of children, colleagues, follows, knows etc doesn't point to itself.

Other tests check for various family relationships (e.g. children must be born after their parents, children cannot contain cycles and birthDate must be before deathDate), emails must match certain regular expressions, and the ranges of longitudes and latitudes. The following TBC screenshot shows an example of an invalid longitude:

A really powerful demonstration of ontology reuse and linkage is the constraint that validates currency codes. There is a finite vocabulary of those, including EUR and USD. The schemaspin ontology simply imports the QUDT namespace that already defines all of those abbreviations, and reports an error if an unknown currency is used on a Microdata page. The following screenshot shows the underlying SPIN constraint (note that this spin:constraint is attached to a property metaclass that marks all properties holding currencies as values):

The constraints above are just a beginning. Much more interesting constraints could be defined if more data is published on the Semantic Web, e.g. by asking the Sindice SPARQL end point with the SPARQL SERVICE keyword to validate that a link to a Person really describes a known http://schema.org/Person, or to compare prices to make sure that my offering of a product is currently the lowest price on the market. The open architecture of SPIN and the richness of SPARQL makes adding these and domain-specific constraints easy and enjoyable.

Sunday, July 03, 2011

SPARQL Web Pages made easy

SPARQL Web Pages (SWP, aka UISPIN) is a templating language for HTML and XML formats that operate on RDF data. In a nutshell, SWP makes it possible to embed SPARQL expressions and queries directly into web page snippets, and to link an RDF or OWL ontology with such SWP snippets. SWP can also be used to generate JSON callback results to support Ajax-style patterns. This basically means that application developers can cover the whole software stack ranging from model to control to view with RDF-based representations only.

At TopQuadrant, we have meanwhile made substantial use of SPARQL Web Pages in internal and customer-facing projects, and have introduced several SWP enhancements with TopBraid 3.5. One thing that several people asked for was SWP support for stand-alone web pages that are not necessarily linked to specific classes in an ontology. In response to this, we have introduced *.swp files, which can be used like PHP or JSP documents with a TopBraid server.

In order to create such SWP files, go to File > New > SPARQL Web Pages file. This will create a stub file with some content that will help you get started:

Through its base platform Eclipse, TopBraid Composer includes a powerful HTML editor, and TBC 3.5.1 includes syntax highlighting for SWP built-ins (see above, you may need to associate *.swp files with the HTML content type in the Eclipse preferences as described in the TBC Help).

As soon as you have created this file, you can immediately execute it via the personal TopBraid Live server that is built into TopBraid Composer. Just visit http://localhost:8083/tbl/test.swp in your browser:

The example above takes one argument (test) and inserts this into the greeting. The expression {= ?test } will insert the current value of the SPARQL variable ?test into the output document. In our example, ?test is fetched from the URL arguments via the built-in function ui:param(). The demo page then creates a simple loop over all instances of kennedys:Person in the query graph and inserts them into an unordered HTML list. The actual query graph is specified using the ui:setContext tag - if this isn't present it will use the default graph.

For a more complete live demo of SWP, visit http://spinservices.org:8080/spin/doc.swp providing documentation of the GoodRelations ontology:

For anyone with experience in hand-editing HTML and JSP or PHP, SPARQL Web Pages should look quite familiar. In fact, SWP borrows ideas from other well-known languages such as loops, assignments, if-then-else branching and user-defined tags, but is 100% SPARQL. With RDF nodes as first-class citizens, this language is IMHO an attractive alternative to projects that use RDF as their primary data representation or integration format.

Thursday, June 09, 2011

Microdata and RDFa in TopBraid Composer

The next release of TopBraid Composer will include comprehensive support for editing and processing schema.org Microdata, and will also have improved support for RDFa. TopBraid is an extension of Eclipse and thus inherits a lot of goodness from the platform, including a very nice HTML editor. It was straight-forward and highly desirable to extend TopBraid with native support for those Web Data formats. Here is a preview of what it will look like.

Working with Microdata and RDFa

When I started exploring Microdata for my own web site, I created a new Eclipse project within TopBraid Composer containing the HTML, CSS and image files for the site.

While I was adding the Microdata tags to the HTML documents, I quickly discovered that RDF based tooling can be extremely helpful to make sure that the published metadata is consistent and of good quality. For example, data about entities (such as the http://schema.org/Person about myself) is split across multiple HTML pages: the front page contains my address, but my personal page contains information about my children. In such cases it is important that both pages use the same identifiers for the same Linked Data entities. This becomes even more important if we want to link to external standard vocabularies, such as ontologies about units, countries or product categories.

Linked Web Data is much more useful than isolated data snippets on individual pages.

As a result of this, I introduced the notion of Web Data Sites into TopBraid Composer - collections of pages in the same folder and its sub-folders. Right click on the project above and select New > Microdata Site File (or RDFa Site File). This opens a wizard with an option for default ontologies to include. For Microdata this is obviously the schema.org namespace, but any other RDF vocabulary can be added later:
This creates a site file (*.mds) that acts as a placeholder for all RDF triples on the HTML pages within the same folder and its subfolders. The site file can be opened like any other RDF data source, it can be imported into other data models, etc. When opened, it will scan the HTML files and always automatically stay up to date when the data on the HTML is changed.

The screenshot below (click on the image for the full size) shows some of the new TBC capabilities in practice.

You can see that TopBraid has built-in views to browse the class hierarchy, properties and instances. These are powerful mechanisms to navigate through the data space that is encoded in the HTML pages. In the example above, you can see that my current Microdata pages contain information about three Persons, as well as various address and location objects. The class tree shows the number of instances of each class. A double-click on an instance will display it on a form. You can see the form view of the resource http://knublauch.com (representing myself as a schema:Person) on the right. Here is a larger view, with the details of one of the children objects opened up:


Alternative views such as graphs and smart browser displays are also built-in. Here is a TBC graph view of some instances:


Analyzing Web Data with SPARQL and SPIN

You can also run SPARQL queries over this data:

We have a lot of SPARQL-based features built into the TopBraid platform, including the rule and constraint language SPIN (now a W3C Member Submission). SPIN is useful to define model-based integrity constraints, and I have started to create a SPIN constraints library for the schema.org namespace. Currently this checks that the value type of properties on the HTML pages matches the range defined by the ontology, but more checks will be added, for example regular expressions of emails, country abbreviations etc. More on this in a separate entry some day.

Editing Microdata and RDFa

Once you have checked constraints and the system reports a violation, you can navigate to the source of the violation on the form of the relevant instance. From those forms, you simply need to double-click on the icon to the left of the value to navigate to the HTML source code:

At this stage, the circle is completed and are in HTML document where you can fix problems (e.g. a misspelled email address). Save the HTML file, and the RDF triple (on the form and elsewhere) will update automatically.

The HTML editor in TopBraid Composer has been enhanced with syntax highlighting for the Microdata attributes such as itemprop. And more is on its way...

Harvesting Microdata and RDFa from the web

In addition to editing and processing local Web Data files, TopBraid can also be used to work with external mark-up from existing pages. TBC Version 3.5 had already introduced the Web Data Basket, and we have extended this to also support Microdata. The mechanism is simple yet powerful: you install a small Firefox extension that will send the pages you visit to your locally running TopBraid Composer. This will collect all RDF metadata contained on the visited pages, and make it available to the RDF, OWL and SPARQL machinery of TBC. This means you can simply browse the web and you will automatically get the stream of RDF triples into your working environment.

Tuesday, April 26, 2011

Faceted Search with TopBraid and SWP

Many Semantic Technology companies offer some kind of faceted browsing tool. With TopBraid 3.5 it was time for TopQuadrant to say "me too", and add some unique capabilities into the mix.

The main idea of faceted browsing is to allow users to narrow down a set of objects by selecting properties that the sought-after objects must possess. For example, if you search for people in the infamous Kennedy ontology, you may want to find all instances of Person that went to the same university and share the same profession. TopBraid's faceted search component follows a user interface paradigm made popular by FreeBase Parallax: you start with a set of all Persons and the system will compute how many matches are in each category. Clicking on a category will narrow down the set, and you can add the next condition. The following screenshot illustrates this, with the facet "alma mater" narrowed down to "Harvard University".

TopBraid's faceted search is implemented by a collection of TopBraid Live servlets and a JavaScript UI library. What you see on the screen above is in fact a web browser embedded into TopBraid Composer. The default stylesheet is simple and can be customized, and it's also possible to use the same JavaScript library in completely different web applications.

One distinguishing capability of TopBraid's Faceted Search support is its customizability. SPARQL Web Pages (SWP) technology can be used to customize the visual appearance of the preview results on the right hand side. A key benefit of SWP is the linkage between ontologies and user interface snippets. Basically, SWP allows you to attach HTML snippets to any RDFS or OWL class in your domain model using the property ui:instanceView, and the system is then able to dynamically select the best suitable visualization for any object that it gets. For example, the visualization for kennedys:Person can be changed as shown below.

The faceted search component looks for visualizations marked with ui:id="facetSummary", and will display them as shown below.

Further customizations are possible without any programming: For example you can specify which properties shall be visible by default, and which properties shall not be selectable as facets.

There is more to be said about this new capability. But if you just want to get started, use TBC-ME 3.5, select the class that you want to search instances of, switch to the Browser tab and pick the facet.ui:SearchView view in the drop down list. Note that on Windows this currently does not work because Eclipse includes an outdated internal web browser. Please use the button Open current page in external browser. Like with any new feature, we appreciate your feedback.

TopBraid Composer's Web Data Basket: Collecting Linked Data while you browse

One of the little new features in TopBraid Composer 3.5 is the Web Data Basket view. This can be used to incrementally download Linked Data (either RDFa or RDF) while browsing the web. The best way to experience this is by getting a small TBC Firefox extension. This will add a tiny TopBraid button to the lower right corner of your browser.
Click on this button while TopBraid Composer is executing, and all RDF data encoded on the currently visited page will be added to TBC's Web Data Basket:

While this Basket displays the raw triples, it also has options to add the loaded triples into the current model. For example, you will get a proper foaf:Person for David Bowie if you visit his DBpedia page:

In order to facilitate the use of this data, TopBraid Composer will automatically add missing imports to namespaces such as foaf and skos. When you follow a hyperlink in your web browser, the basket will get more content. This means that the system will accumulate any Linked Data into TopBraid as you navigate through the web.

This little Web Data Basket makes it easy to collect Linked Data without having to leave your favorite tools. I think it provides a fine example of how Linked Data could be used, e.g. to build up a shopping list of products backed with GoodRelations data.

Thursday, April 21, 2011

SPINMap: SPARQL-based Ontology Mapping with a Graphical Notation

One of the new features in the upcoming TopBraid 3.5 release is called SPINMap. SPINMap is a SPARQL-based language to represent mappings between RDF/OWL ontologies. These mappings can be used to transform instances of source classes into instances of target classes. This is a very common requirement to create Linked Data, for example starting with spreadsheets, XML files or databases, but also from one domain-specific ontology into a more generic one. As a first impression, here is a picture of SPINMap in action:


If you would like to learn about this with a visual demo, please take a look at the


In the rest of this blog entry I will cover similar content to the video, but with screenshots and prose.

Introduction to SPINMap

SPARQL is a rich language that can be used for many purposes. The SPARQL CONSTRUCT keyword is particularly useful to define rules that map from one graph pattern (in the WHERE clause) to another graph pattern. This makes it possible to define sophisticated rules that map instances from one class to instances of another one.

The SPIN framework provides several mechanisms that make the definition of such SPARQL-based mapping rules easier. In particular, SPIN makes it easy to associate mapping rules with classes, and SPIN templates and functions can be exploited to define reusable building blocks for typical modeling patterns.

The SPINMap vocabulary (http://spinrdf.org/spinmap) is a collection of reusable design patterns that reflects typical best practices in ontology mapping. SPINMap models can be executed in conjunction with other SPARQL rules with any SPIN engine. The main advantage of SPINMap is that it provides a higher-level language that is suitable to be edited graphically. TopBraid Composer 3.5 provides a visual editor that makes it easy to establish ontology mappings using drag and drop, and filling in forms.

It is a good practice to store the ontology mapping rules in files separate from the source and target files. The mapping file only needs to import the SPINMap namespace (which in turn imports SPIN etc). The easiest way to get started is to use File > New > RDF/OWL/SPIN File... and then to activate the check box for "SPINMap Ontology Mapping Vocabulary", as shown below.
This will create an empty file importing http://topbraid.org/spin/spinmapl. As a next step, you should drag the source and target ontologies into the Imports view so that those get imported into the mapping ontology. Then select the class you want to start mapping, and switch to the Diagram tab. In the example below, the source ontology A defines a class a:Person, and we want to map it into the target class b:Customer.


Use drag and drop (e.g. from the Classes view) to add other classes to the Diagram. If the SPINMap namespace is present, the Diagram will provide additional capabilities and use a different layout algorithm than usual. If you move the mouse over a class, a triangular anchor point will appear in the upper right corner of the class box. It will turn green if you move the mouse over it, and if it can be made the source of a mapping. Click on this and keep the mouse button pressed to establish a link to another class. Move the mouse over the incoming upper anchor of the target class and release the mouse. A dialog like the one below will appear.


This dialog is used to create a "mapping context" that is later used to determine how the target instances shall be selected from the source instances. In particular this is used to construct URIs from the values of a given resource, e.g. so that a:Instance-0-1 is turned into b:John-Smith. The dialog provides a collection of target functions that can be used for that purpose. You simply need to pick an appropriate function and fill in the blanks to establish a mapping context. In the example screenshot, a new URI is constructed from the values of the source properties a:firstName and a:lastName and a provided URI template. This assumes that those properties together serve as unique identifiers, similar to primary keys in a database. Other algorithms can be created if needed through SPIN functions.

As soon as you have filled in all required arguments of the mapping context function, the preview panel of the dialog will give you an idea of how the resulting values will look like. When you are happy with this, press OK.

The resulting context will be displayed with a yellow graph node as shown below.


If you ever need to edit this context node again, e.g. to change the URI template, just double-click on it. Right-clicking the node opens a context menu with an option to delete it.

Once a context has been established between two classes, the user interface makes it possible to add transformations. In the example above, the source class has a property a:dob that holds date of birth values as raw strings, such as "30/04/1985". We want to map this into the target property b:birthDate, which is a well-formed xsd:date in the format "1985-04-30". TopBraid's SPARQL library provides a built-in function spif:parseDate to make this task easier. Use the mouse to draw a connection from a:dob to b:birthDate. A dialog such as the following will appear.


In this dialog you can either manually select a transformation function, or check if the system has any suggestions for you, on the Suggestions tab. In this case, the system suggests spif:parseDate with pre-defined patterns to convert raw dates into valid xsd:date literals. Pressing OK, this creates a mapping transformation as shown below.


At any point in time, TopBraid Composer makes it easy to try the mapping out. Assuming TopSPIN is the selected inference engine, just press the Run Inferences button in the main tool bar to see the results.


As you can see above, each instance of the a:Person class has been mapped into a corresponding instance of b:Customer. The URI of the target resources has been generated using the string insertion template based on first name and last name. Furthermore, proper birth dates have been generated from the raw source strings. The context menu of the Inferences view provides options to assert the resulting RDF triples if desired, or you can use the Triples View to move them elsewhere.

It is possible to add any number of other transformations in similar ways. Some transformations take more than one argument. In that case, additional input anchor points will be displayed, as shown for the node "concat with separator" below.


Note that a complex example like above uses a number of different design patterns. Some additional of those patterns are explained in the tutorial video, that I would strongly recommend if you want to save time with this technology.

Understanding and Extending SPINMap

The mini tutorial above might be enough for many users to get started. For advanced users with knowledge of SPIN, the following background may be helpful to understand how SPINMap works, and how it can be extended.

SPINMap is an entirely declarative application of SPIN. This means you can explore the mappings generated by the visual editor from an RDF perspective, e.g. using TBC forms. In the example above, the form for a:Person displays a collection of SPIN Template calls:


You can drill into the templates by opening up the + sign that appears when you hover the mouse over the template icon.


The example above illustrates that SPINMap is based on a (small) collection of generic templates, such as spinmap:Mapping-2-1 which represents a mapping from 2 source properties into 1 target property. Each of those templates a linked to a spinmap:Context which is used at execution time to determine the target URIs. Furthermore, the argument spinmap:expression points to a SPARQL expression, SELECT or ASK query, or even a constant URI or literal that is used to compute the target value from the source value(s). The SPINMap templates are using the function spin:evalto evaluate those expressions at execution time. When executed, the expression will be invoked with pre-assigned values for ?arg1, ?arg2 etc, based on the current values of spinmap:sourcePredicate1 on the source instances.

Since in practice any SPARQL function can be used as spinmap:expression, users can also add their own SPIN functions where appropriate. It is also possible to use the built-in SPARQL functions such as xsd:string().
The mapping context uses a similar mechanism, also based on spin:eval to create target URIs. You can open any instance of spinmap:Context to see how this is done.

In the example above, the target function spinmapl:buildURI2 is used to derive a new URI from two input properties and a template. You are free to define your own target functions there, as long as they are instances of spinmap:TargetFunction (and subclass of spinmap:TargetFunctions).

If you are writing your own functions, or want to make the system smarter, you can add your own spinmap:suggestionXY values to the functions. These are SPARQL CONSTRUCT queries that may construct zero or more instances of the function, with partially filled in fields, as well as a spinmap:suggestionScore. See the function spif:parseDate for an example of what can be done with this mechanism.

Monday, April 04, 2011

SPIN is a W3C Member Submission

The SPARQL Rules language SPIN has evolved over the last couple of years as an integral part of TopQuadrant's TopBraid Suite. SPIN started during a discussion between Dean Allemang and myself, in which we brainstormed about having an RDF syntax for SPARQL. I went ahead and implemented this based on Jena's ARQ API, and the result eventually became the SPIN RDF Syntax. This was no rocket science, because similar ideas of representing higher level languages by means of RDF blank node structures had been explored by OWL and SWRL.

Prior to our work on SPIN, we had already experimented with various mechanisms to link SPARQL queries with RDF data structures, so that they could be shared as query libraries. TopBraid veterans may remember the sparql:query property that was introduced to store SPARQL queries (as strings) together with RDF models. So while I was working on the SPIN RDF Syntax, I noticed that we now have a much better way of achieving this goal. A quick cross-reference to object-oriented languages led to me select properties such as spin:rule and spin:constraint to point from a class to a SPARQL query, expressed in RDF. This later became the SPIN Modeling Vocabulary.

Once I had the rules and constraint mechanism in place, I noticed that many rules and constraints were following similar patterns, with just one or two values different in each rule. This led to the creation of SPIN Templates. Templates then became the foundation of user-defined SPIN Functions. With those two pieces in place, SPIN suddenly became a language that was fundamentally different (and better) than what similar languages such as SWRL provided, because it became possible for users to define their own modeling vocabulary, and even extend the expressivity of SPARQL.

The first version of SPIN was published as part of TopBraid Composer in January 2009. Since then, it was positively received by our user community and practical use cases have enabled us to fine tune and extend the language over the years. Now, around three years after its first experimental versions, we found the time was right to officially share SPIN with the broader community, and make clear that it is not a proprietary TopQuadrant technology. Together with James Hendler and Kingsley Idehen, we put together a SPIN W3C Member Submission that has just been published on the W3C site.

The status of a Member Submission means that TopQuadrant encourages other tool vendors to also provide SPIN implementations, and as I have heard there is work in progress already. The Member Submission also indicates that SPIN may play a role as input to future revisions of other standards such as RIF. This is all very good. Of course a full spec of SPIN as an official W3C standard would be even better, but going through the whole standardization process is a long and difficult journey. Given that SWRL had become a similar de-facto standard with Member Submission status alone indicates to me that SPIN has good chances of achieving the same. In fact I strongly believe that the fact that SPIN is based on SPARQL will be crucial in winning the hearts and minds of many Semantic Web and Linked Data enthusiasts. SPIN can co-exist with other languages including OWL 2 RL and SKOS. SPIN doesn't require any special execution engine apart from a SPARQL store. The learning curve is very low for anyone who already knows SPARQL. SPIN is part of the Semantic Web technology stack.

A good place to start learning SPIN is the TopBraid SPIN page, with screenshots and links to a tutorial. For programmers, there is an open source SPIN API available.

Tuesday, January 25, 2011

A Textual Syntax for SPARQLMotion

SPARQLMotion is an RDF-based scripting language that is suitable to be presented and edited graphically to form data processing pipelines. Many of our customers are using SPARQLMotion and we are constantly extending and refining the tools to make it more powerful. One of the recent enhancements that made it into TopBraid 3.4 is support for an alternative textual notation for SPARQLMotion. A spec for this notation can be found here:


An example of how this XML-based notation for SPARQLMotion can be used is shown in the TopBraid Composer screenshot's sm:bodyScript field below.

The script above is stored in the same RDF-based format like other SPARQLMotion scripts, and can still be visualized graphically:

In the past few years since SPARQLMotion was created, several people had asked about a notation that can be edited with conventional text editing tools to create scripts. Among the advantages of a text-based notation is that it becomes easier to perform large-scale refactorings to move things around. It is sometimes simply faster, plus there is no need to "invent" artificial URIs for the nodes in a script. A great plus of the XML-based notation is that it becomes easy to insert SPARQL Web Pages (aka UISPIN) snippets directly into a single document. This can significantly accelerate the development of SPARQL-based web services.

The XML notation does have various limitations though. In particular it is only suitable for a subset of SPARQLMotion - there is no concept of multiple predecessor nodes in a linear notation.