Validating schema.org Microdata with SPIN
The new 3.5.1 version of TopBraid Composer introduces some initial features to import, browse, edit and analyze Microdata. I wrote about this in a previous blog entry - if you want to try those features just download TBC's evaluation version, keeping in mind that Microdata support is still at an early stage and that, for example, the parser isn't fast yet. Today I will focus on a SPARQL-based approach for validating schema.org Microdata using SPIN inside of TopBraid.
I have published a library of SPIN constraints at http://topbraid.org/spin/schemaspin. This library currently includes 11 types of integrity constraints covering various aspects on the schema.org ontology, as shown below.
The most basic tests are making sure that schema.org properties can only be used at classes with matching domains, and that the values of those properties match their declared ranges. These tests alone may help you identify many potential errors at edit time. Another generic test is associated with owl:IrreflexiveProperty, making sure that a value of children, colleagues, follows, knows etc doesn't point to itself.
Other tests check for various family relationships (e.g. children must be born after their parents, children cannot contain cycles and birthDate must be before deathDate), emails must match certain regular expressions, and the ranges of longitudes and latitudes. The following TBC screenshot shows an example of an invalid longitude:
A really powerful demonstration of ontology reuse and linkage is the constraint that validates currency codes. There is a finite vocabulary of those, including EUR and USD. The schemaspin ontology simply imports the QUDT namespace that already defines all of those abbreviations, and reports an error if an unknown currency is used on a Microdata page. The following screenshot shows the underlying SPIN constraint (note that this spin:constraint is attached to a property metaclass that marks all properties holding currencies as values):
The constraints above are just a beginning. Much more interesting constraints could be defined if more data is published on the Semantic Web, e.g. by asking the Sindice SPARQL end point with the SPARQL SERVICE keyword to validate that a link to a Person really describes a known http://schema.org/Person, or to compare prices to make sure that my offering of a product is currently the lowest price on the market. The open architecture of SPIN and the richness of SPARQL makes adding these and domain-specific constraints easy and enjoyable.