Tuesday, July 05, 2011

Validating schema.org Microdata with SPIN

The new 3.5.1 version of TopBraid Composer introduces some initial features to import, browse, edit and analyze Microdata. I wrote about this in a previous blog entry - if you want to try those features just download TBC's evaluation version, keeping in mind that Microdata support is still at an early stage and that, for example, the parser isn't fast yet. Today I will focus on a SPARQL-based approach for validating schema.org Microdata using SPIN inside of TopBraid.

I have published a library of SPIN constraints at http://topbraid.org/spin/schemaspin. This library currently includes 11 types of integrity constraints covering various aspects on the schema.org ontology, as shown below.

The most basic tests are making sure that schema.org properties can only be used at classes with matching domains, and that the values of those properties match their declared ranges. These tests alone may help you identify many potential errors at edit time. Another generic test is associated with owl:IrreflexiveProperty, making sure that a value of children, colleagues, follows, knows etc doesn't point to itself.

Other tests check for various family relationships (e.g. children must be born after their parents, children cannot contain cycles and birthDate must be before deathDate), emails must match certain regular expressions, and the ranges of longitudes and latitudes. The following TBC screenshot shows an example of an invalid longitude:

A really powerful demonstration of ontology reuse and linkage is the constraint that validates currency codes. There is a finite vocabulary of those, including EUR and USD. The schemaspin ontology simply imports the QUDT namespace that already defines all of those abbreviations, and reports an error if an unknown currency is used on a Microdata page. The following screenshot shows the underlying SPIN constraint (note that this spin:constraint is attached to a property metaclass that marks all properties holding currencies as values):

The constraints above are just a beginning. Much more interesting constraints could be defined if more data is published on the Semantic Web, e.g. by asking the Sindice SPARQL end point with the SPARQL SERVICE keyword to validate that a link to a Person really describes a known http://schema.org/Person, or to compare prices to make sure that my offering of a product is currently the lowest price on the market. The open architecture of SPIN and the richness of SPARQL makes adding these and domain-specific constraints easy and enjoyable.

Sunday, July 03, 2011

SPARQL Web Pages made easy

SPARQL Web Pages (SWP, aka UISPIN) is a templating language for HTML and XML formats that operate on RDF data. In a nutshell, SWP makes it possible to embed SPARQL expressions and queries directly into web page snippets, and to link an RDF or OWL ontology with such SWP snippets. SWP can also be used to generate JSON callback results to support Ajax-style patterns. This basically means that application developers can cover the whole software stack ranging from model to control to view with RDF-based representations only.

At TopQuadrant, we have meanwhile made substantial use of SPARQL Web Pages in internal and customer-facing projects, and have introduced several SWP enhancements with TopBraid 3.5. One thing that several people asked for was SWP support for stand-alone web pages that are not necessarily linked to specific classes in an ontology. In response to this, we have introduced *.swp files, which can be used like PHP or JSP documents with a TopBraid server.

In order to create such SWP files, go to File > New > SPARQL Web Pages file. This will create a stub file with some content that will help you get started:

Through its base platform Eclipse, TopBraid Composer includes a powerful HTML editor, and TBC 3.5.1 includes syntax highlighting for SWP built-ins (see above, you may need to associate *.swp files with the HTML content type in the Eclipse preferences as described in the TBC Help).

As soon as you have created this file, you can immediately execute it via the personal TopBraid Live server that is built into TopBraid Composer. Just visit http://localhost:8083/tbl/test.swp in your browser:

The example above takes one argument (test) and inserts this into the greeting. The expression {= ?test } will insert the current value of the SPARQL variable ?test into the output document. In our example, ?test is fetched from the URL arguments via the built-in function ui:param(). The demo page then creates a simple loop over all instances of kennedys:Person in the query graph and inserts them into an unordered HTML list. The actual query graph is specified using the ui:setContext tag - if this isn't present it will use the default graph.

For a more complete live demo of SWP, visit http://spinservices.org:8080/spin/doc.swp providing documentation of the GoodRelations ontology:

For anyone with experience in hand-editing HTML and JSP or PHP, SPARQL Web Pages should look quite familiar. In fact, SWP borrows ideas from other well-known languages such as loops, assignments, if-then-else branching and user-defined tags, but is 100% SPARQL. With RDF nodes as first-class citizens, this language is IMHO an attractive alternative to projects that use RDF as their primary data representation or integration format.