Wednesday, September 20, 2006

RDF/A support in TopBraid Composer

Ok, I have done my homework and have created this nice RDF/OWL ontology. But how can I put it on my web page? What can I actually do with my ontology? - we all have seen many variations of this question on various mailing lists. Here is one potential answer.

RDF/A (sometimes spelled RDFa) is an evolving working draft in the context of W3C's Semantic Web activity. RDF/A can be used to embed RDF/OWL into XML languages. In particular, RDF/A defines a collection of attributes for embedding RDF data into XHTML pages.

This is a potentially very important step towards implementing the Semantic Web vision, because it provides an incremental entry point for mainstream Web developers. These people are typically experts in HTML, but find RDF rather scary to look at. RDF/A enables them to add a few tags into their existing web pages, so that Semantic Web enabled web browsers and agents can process additional (meta) data in a machine-readable format.

To get an idea how this works, have a look at Dean Allemang's class announcement page

A user visiting this page with a browser will see a typical internet page describing upcoming seminars with links to a flyer, dates and locations. However, a closer look into the HTML source code reveals additional tags like the following:

The highlighted parts are RDF/A tags defining RDF triples - here to specify start and end time of an event, as well as the geographical location of the venue. In the spirit of next-generation Web applications, this extra information can be extracted on the fly and used for all kinds of interesting services. For example, a simple Firefox plugin would enable users to drag cal:-tagged items into their personal calendar, or to visualize geo:-tagged locations on a map.

In this spirit, our ontology development platform TopBraid Composer now also supports RDF/A data sources. Download version 1.2.2 and use the Import wizard to connect to the Web page link above. This will create a virtual proxy ontology which you can then import into other RDF/OWL projects. For example, you can import it into the geotravel ontology. A look at the Triples View will then reveal the additional triples:

These RDF/A triples are treated like any other subgraph of the overall model, i.e. you can run SPARQL queries on them, classify them according to OWL semantics, and visualize them. In this particular case, you can use Composer's Geography support to show the seminar venues on a Google map (click for a larger image):

We are working on additional mash-up facilities within TopBraid Composer so that developers can benefit from very rapid turn-around times between ontology design and testing.

In order to get this going, I wrote an RDF/A parser based on SAX and Jena. During the development of this parser it became obvious that RDF/A is still evolving and not stable yet. However, the language is fairly small and therefore easy to adopt.

Many people argue that a limitation of RDF/A is that it only works for well-formed XML files. On the Web, of course, few people are using XML-compliant HTML (XHTML), so that adding RDF/A tags is often painful. We were hit by this problem ourselves, when Dean added the RDF/A markup to his existing web page - a page that had a long history with lots of manual edits with many different tools. We solved this problem by adding an optional pre-processor based on JTidy, an HTML to XHTML converter. Using this pre-processor, TopBraid is much more forgiving to ill-formed HTML.

Sincere acknowledgements to Elias Torres for providing me with helpful advice and test cases for the RDF/A parser, and to Dean Allemang who suggested to implement this feature in the first place. Note that both his example web page and our RDF/A support are work in progress - please stay tuned and let us know what you think!

Tuesday, September 05, 2006

Ontology Mapping with SPARQL CONSTRUCT

Ontology mapping is regarded as one of the key technologies for data integration, for example to mediate between databases that have different but similar schemas. A lot of papers have been published on the topic (see a State of the Art paper from 2005).

I am not an expert in this topic, but most approaches that I have seen so far seem to employ specialized mapping ontologies that define bridges, for example to map a property "name" in a source ontology into property "lastName" in a target ontology. Mapping engines are then needed to interpret the mapping rules. I think a lot of research prototypes exist, but I don't think the Semantic Web community has reached any conclusive standard mapping ontology or implementations beyond prototypes yet.

We are running some very promising experiments with using SPARQL for ontology mapping. SPARQL is best known as the upcoming W3C standard query language for RDF, but few people notice that beside its SELECT command, SPARQL also defines a CONSTRUCT keyword. The input of a CONSTRUCT query is a WHERE clause describing a pattern in a source model, including variable definitions. The output is an RDF graph that inserts all matching variable bindings into a target graph template.

Here is an example screenshot from TopBraid Composer's new SPARQL visualization (click for a larger image). In this example, a SPARQL query is used to convert instances of a source:Person class into target:Persons. To make this example more interesting, the string values of source:car are converted into instances of a class target:Car (that's why the query looks so scary).

For example, if you have a set of source instances Bob and Alice

then the output is a new subgraph in the target model, but with the cars as objects instead of strings:

The trick is that CONSTRUCT generates new triples, and these triples can be treated as "inferences" and added to the target model. TopBraid's SPARQL window displays what happens under the hood (the screenshot actually shows a different version of the query from above):

As I said I am not an expert on ontology mapping and therefore don't want to comment whether this approach is better than other ontology mapping tools. However, it seems to me that the popularity of SPARQL and the large number of tools that support SPARQL make this a very promising idea. We may assume that in the near future most Semantic Web developers will know SPARQL and therefore don't need to learn any other "mapping ontology". Also, SPARQL is supported by optimized query engines, and SPARQL is fairly expressive with regards to query filters etc. And if the default expressivity is not enough, you still have property functions.

I guess a lot of more research can go into this idea, and lots of new papers could be written, for example on typical design patterns (such as a "property bridge" pattern), how to edit SPARQL visually and how to shape future editions of the SPARQL standard to meet the ontology mapping use case best. For example, it appears to be impossible to create new URIs for the resources in the target ontology - only bnodes can be created on the fly. We therefore added a simple post-processor that uses the rdfs:label to create suitable URIs.

Given the fact that CONSTRUCT queries can create or infer new triples, it may also be worth investigating whether SPARQL could serve as a rule language, similar to SWRL.

As a side effect of these new features (and its existing support to import relational databases, UML, XML Schema and Excel, and to operate on Jena, Oracle and Sesame databases), TopBraid Composer is increasingly becoming a data and knowledge integration platform and is no longer "just" an ontology editor.