Saturday, November 10, 2007

Semantic XML: Mapping arbitrary XML documents to OWL

XML is everywhere. Many XML documents contain valuable information that should be lifted into Semantic Web technology. Furthermore, we often encounter situations in which we want to generate XML output from data stored in our ontologies, for example to interoperate with external tools and web services. The new Maestro Edition of TopBraid Composer introduces a new XML-to-OWL/RDF mapping capability, called Semantic XML.

I made a small screencam video (10 minutes) to illustrate some of Semantic XML's capabilities. In the video I am taking an XML document, load it into TopBraid and run a SPARQL query over it. Then I am fine-tuning the generated ontology and show how other files can map into the resulting OWL model. Finally I show how to load arbitrary HTML documents into a Semantic XML-based XHTML ontology.

Here is also a screenshot of what Semantic XML does (click on the image for a larger view):

TopBraid can automatically generate an OWL/RDF ontology from any XML file. Each distinct XML element name is mapped into a class, and the elements themselves become instances of those classes. A datatype property is generated for each attribute. The nesting of the XML elements is stored by means of the composite:child property described in a recent blog entry.
The key idea of Semantic XML is that each of the generated OWL classes and datatype properties is annotated with an annotation property (xmap:element and xmap:attribute, resp.) that points from the OWL concept to the XML serialization. These annotations are also used if an OWL model needs to be serialized back to XML format.
If you import an XML file into an ontology that already contains classes and properties with Semantic XML annotations, then the loader will reuse those. For example, we have defined an XHTML ontology that is automatically loaded when a user opens an HTML file with TopBraid Composer. We plan to provide further standard mapping ontologies for other popular formats such as SVG in the near future.
To summarize, TopBraid can be used to import arbitrary XML documents into OWL so that they can be queried and processed with semantic web tools. The mapping is bi-directional and lossless so that files can be loaded, manipulated and saved without losing structural information.
Note that this mapping approach is very generic, but may not provide the best possible mapping for every XML-based language. In some cases, if starting with an XML Schema file is possible, the XML Schema importer of TopBraid may be more appropriate. On the other hand, TopBraid's round-tripping capability makes it a better option when bidirectional interoperability with existing XML-based tools is needed.
Edited (2007-11-17): Note that earlier versions of this blog entry referred to this technology under the name xmap which is trademarked by another company, so we have changed to Semantic XML.


Post a Comment

<< Home