Composing the Semantic Web: November 2007

Thursday, November 15, 2007

SparqlMotion: A visual semantic web scripting language

The open architecture of semantic web languages like RDF, OWL and SPARQL make them an excellent choice for data integration problems, aka mash-ups. Semantic technology tools can be used to bring together heterogeneous data sources, to post-process and filter them, and to query the resulting aggregated data models. One of those tools, TopBraid Composer, provides import capabilities for legacy data in XML, UML, relational databases, spreadsheets, news feeds, HTML pages etc. Users can edit ontologies to bridge the various data items, and run inference or query engines to get information out. However, going through these steps is typically a manual process that needs to be repeated for each new data source.

SparqlMotion is a new visual language that enables average users to define scripts to import, post-process, query and visualize data using semantic web technology. Users can define and share those scripts as OWL models, based on a dedicated SparqlMotion ontology and module library. The graph editor of Composer's Maestro Edition (or any other OWL editor) can be used to define the data and execution flow of these scripts using drag and drop:

Here is a screencam video (15 minutes) that shows how to create the above SparqlMotion script with TopBraid Composer. Here is the example script in N3 notation. The script loads data from a news feed, post-processes the resulting triples, ask the user to enter a keyword, and then displays all events that contain the keyword in a calendar. The output of the script could also be another file, a spreadsheet, a database or a dynamic model that can be imported into other ontologies.

Each of the nodes in the above diagram represents a data processing step, which must be an instance of a SparqlMotion module class such as sml:LoadNewsFeed. The sm:next relationship specifies the information flow between two modules. For example, the resulting output of the newsfeed loader (RDF triples) is used as input for the data type conversion module below it. The latter module can process/filter the RDF input and pass it on to the next node etc. Scripts can branch their data flow and merge RDF input of multiple modules into a single node at any time.

Two information formats are currently supported: RDF and XML. We provide translation modules based on our Semantic XML algorithm that can convert between RDF and XML at any time. In addition to these formats, modules can bind variables. For example, a user input module such as "Enter keyword" above can prompt the user to enter a string and then pass that string literal to the following modules in a variable such as "keyword". Succeeding modules can reference this variable, for example, in SPARQL queries.

SPARQL is the central language of SparqlMotion. Many modules (such as those that display data on a calendar or a Google map) use a SPARQL query to select which resources to display. There is also an iteration module that repeats other modules for each result row of a SPARQL select clause. Finally, SPARQL's CONSTRUCT keyword is used heavily to transform and filter RDF data.

The SparqlMotion modules library is growing rapidly since we started using it in customer projects. We are also working on a web-based graph editor in Flex based on TopBraid Ensemble's graphing capabilities. This will remind some people of Yahoo Pipes. The current version included in TopBraid Composer Maestro is rather alpha software as we better understand SparqlMotion design patterns and add support for best practices. We expect to incrementally roll out many more features over the next few months. In any case, the system is available for download if you want to give it a try. Make sure to watch the video before exploring this exciting space.

Wednesday, November 14, 2007

Creating documents with SPARQL and JSP

My co-workers at TopQuadrant recently had a deadline to create some deliverables for a large customer project (for a national space agency :) ). A large fraction of the work in this project is actually ontology design, and the deliverables were Word documents that described and explained these ontologies. Tired of endless hours of manual work, my manager wished he had a "generate-document-from-ontology" button.

On that same day we implemented a prototypical document generator that allows users to embed SPARQL queries into HTML templates, and then let the system insert the resulting variable bindings into certain spots in the templates. The basic idea is that certain blocks of text (such as a row in an HTML table) are repeated for each result row of the SPARQL query. Furthermore, loops over SPARQL queries can be nested so that you can reference variables from the outside in an inner loop.

This turned out to be a really useful feature, so I generalized this to a new tool feature based on a generic document generation framework, Java Server Pages (JSP). Since I had never implemented anything on top of JSP, I did a quick search on the web, revealing David Powell's SPARQL JSP taglib. This small but fine open-source project turned out to be a very helpful starting point for the implementation of TopBraid Composer's new document generation facility. Since TBC's Maestro Edition comes with its own integrated Java web server, we can now execute Java Server Pages within the development tool.

In a nutshell, this feature can be used to create arbitrary text documents (such as HTML or XML files) from the RDF model that is currently open in Composer. The user selects a JSP page (which can be edited with Eclipse/TBC or tools such as DreamWeaver) and then the output file. The system compiles the JSP page internally into a Servlet and then runs it, writing the result into a new file in the Eclipse workspace. With the TopBraid Live server, these pages can also be put online to produce dynamic web pages.

Here is an example JSP document with embedded SPARQL code. When executed over this ontology, you get this output.

Monday, November 12, 2007

BIRT: Creating SPARQL-based charts and reports

Being based on the Eclipse platform, TopBraid Composer can seamlessly integrate with other Eclipse-based tools and services. One of the most complex Eclipse plug-ins is BIRT, an open source Eclipse-based reporting system that can be used to generate charts and other reports from input data. BIRT is typically used to take its input from relational databases or spreadsheets, but it provides an open architecture that allows programmers to plug in arbitrary tabular data sources.

A simple way of generating tabular data from an OWL/RDF data model is via SPARQL Select queries, which deliver result sets in rows with one or multiple columns. TopBraid Composer's Maestro Edition now provides a powerful interface between any OWL/RDF data source and BIRT. The following screenshot illustrates the kind of output that BIRT can create.

Details on how to use BIRT are available in TopBraid's help pages. I also made a quick screencam demo (3 minutes) showing BIRT.

Saturday, November 10, 2007

Semantic XML: Mapping arbitrary XML documents to OWL

XML is everywhere. Many XML documents contain valuable information that should be lifted into Semantic Web technology. Furthermore, we often encounter situations in which we want to generate XML output from data stored in our ontologies, for example to interoperate with external tools and web services. The new Maestro Edition of TopBraid Composer introduces a new XML-to-OWL/RDF mapping capability, called Semantic XML.

I made a small screencam video (10 minutes) to illustrate some of Semantic XML's capabilities. In the video I am taking an XML document, load it into TopBraid and run a SPARQL query over it. Then I am fine-tuning the generated ontology and show how other files can map into the resulting OWL model. Finally I show how to load arbitrary HTML documents into a Semantic XML-based XHTML ontology.

Here is also a screenshot of what Semantic XML does (click on the image for a larger view):

TopBraid can automatically generate an OWL/RDF ontology from any XML file. Each distinct XML element name is mapped into a class, and the elements themselves become instances of those classes. A datatype property is generated for each attribute. The nesting of the XML elements is stored by means of the composite:child property described in a recent blog entry.

The key idea of Semantic XML is that each of the generated OWL classes and datatype properties is annotated with an annotation property (xmap:element and xmap:attribute, resp.) that points from the OWL concept to the XML serialization. These annotations are also used if an OWL model needs to be serialized back to XML format.

If you import an XML file into an ontology that already contains classes and properties with Semantic XML annotations, then the loader will reuse those. For example, we have defined an XHTML ontology that is automatically loaded when a user opens an HTML file with TopBraid Composer. We plan to provide further standard mapping ontologies for other popular formats such as SVG in the near future.

To summarize, TopBraid can be used to import arbitrary XML documents into OWL so that they can be queried and processed with semantic web tools. The mapping is bi-directional and lossless so that files can be loaded, manipulated and saved without losing structural information.

Note that this mapping approach is very generic, but may not provide the best possible mapping for every XML-based language. In some cases, if starting with an XML Schema file is possible, the XML Schema importer of TopBraid may be more appropriate. On the other hand, TopBraid's round-tripping capability makes it a better option when bidirectional interoperability with existing XML-based tools is needed.

Edited (2007-11-17): Note that earlier versions of this blog entry referred to this technology under the name xmap which is trademarked by another company, so we have changed to Semantic XML.

Friday, November 09, 2007

TopBraid Composer - Maestro Edition

I am thrilled to announce TopBraid Composer 2.4.0. This release features significant usability improvements (graph editing and support for multiple editors/forms for the same file), and also Oracle 11g integration. Please check our web page for a detailed list of changes.

With version 2.4.0 we are also launching an extended version of TopBraid Composer, called the Maestro Edition. Maestro includes all features of Standard TBC, but also comes with many new and extremely powerful capabilities that are not present in the Standard Edition. A detailed list of new features of Maestro is available online, but here is a summary:

A built-in TopBraid Live test server for rapid application development.
Support for document generation with semantic Java Server Pages.
SparqlMotion: A new visual semantic web scripting language comparable with Yahoo Pipes.
Semantic XML: A new technology to import arbitrary XML files, and to query, edit and export them back to XML.
BIRT: A SPARQL-based visual report generation and charting tool kit.
EMail import to OWL for semantic analysis of emails.

I will write more about all those capabilities in the next few days. Stay tuned!

The Maestro Edition is available for download now and we have reset the evaluation period so that all users can get a fresh view on the tool.

Composing the Semantic Web