Thursday, November 15, 2007

SparqlMotion: A visual semantic web scripting language

The open architecture of semantic web languages like RDF, OWL and SPARQL make them an excellent choice for data integration problems, aka mash-ups. Semantic technology tools can be used to bring together heterogeneous data sources, to post-process and filter them, and to query the resulting aggregated data models. One of those tools, TopBraid Composer, provides import capabilities for legacy data in XML, UML, relational databases, spreadsheets, news feeds, HTML pages etc. Users can edit ontologies to bridge the various data items, and run inference or query engines to get information out. However, going through these steps is typically a manual process that needs to be repeated for each new data source.

SparqlMotion is a new visual language that enables average users to define scripts to import, post-process, query and visualize data using semantic web technology. Users can define and share those scripts as OWL models, based on a dedicated SparqlMotion ontology and module library. The graph editor of Composer's Maestro Edition (or any other OWL editor) can be used to define the data and execution flow of these scripts using drag and drop:




Here is a screencam video (15 minutes) that shows how to create the above SparqlMotion script with TopBraid Composer. Here is the example script in N3 notation. The script loads data from a news feed, post-processes the resulting triples, ask the user to enter a keyword, and then displays all events that contain the keyword in a calendar. The output of the script could also be another file, a spreadsheet, a database or a dynamic model that can be imported into other ontologies.



Each of the nodes in the above diagram represents a data processing step, which must be an instance of a SparqlMotion module class such as sml:LoadNewsFeed. The sm:next relationship specifies the information flow between two modules. For example, the resulting output of the newsfeed loader (RDF triples) is used as input for the data type conversion module below it. The latter module can process/filter the RDF input and pass it on to the next node etc. Scripts can branch their data flow and merge RDF input of multiple modules into a single node at any time.



Two information formats are currently supported: RDF and XML. We provide translation modules based on our Semantic XML algorithm that can convert between RDF and XML at any time. In addition to these formats, modules can bind variables. For example, a user input module such as "Enter keyword" above can prompt the user to enter a string and then pass that string literal to the following modules in a variable such as "keyword". Succeeding modules can reference this variable, for example, in SPARQL queries.



SPARQL is the central language of SparqlMotion. Many modules (such as those that display data on a calendar or a Google map) use a SPARQL query to select which resources to display. There is also an iteration module that repeats other modules for each result row of a SPARQL select clause. Finally, SPARQL's CONSTRUCT keyword is used heavily to transform and filter RDF data.



The SparqlMotion modules library is growing rapidly since we started using it in customer projects. We are also working on a web-based graph editor in Flex based on TopBraid Ensemble's graphing capabilities. This will remind some people of Yahoo Pipes. The current version included in TopBraid Composer Maestro is rather alpha software as we better understand SparqlMotion design patterns and add support for best practices. We expect to incrementally roll out many more features over the next few months. In any case, the system is available for download if you want to give it a try. Make sure to watch the video before exploring this exciting space.

0 Comments:

Post a Comment

<< Home