Scalability is one of the most pressing requirements of almost all the Semantic Web projects that I have been involved in with TopQuadrant. Realistic applications require the ability to work with millions of RDF triples. These triples may include data and metadata about engineering drawings, multimedia, business knowledge, mathematical simulations, search thesauri etc. The designers and programmers of these systems ask us whether we think that Semantic Web technology will scale and perform well under these heavy loads. After all, everything in RDF (and OWL) is stored as triples, comparable to a single large database table with three columns for subjects, predicates and objects.
Fortunately, a lot of smart people around the world are putting their energy into developing scalable triple stores. In the Java open-source world,
Sesame and
Jena are the best known choices, and Sesame's database support in particular is known to have excellent performance characteristics. The Jena folks are working on optimizations. Another scalable open-source RDF database is
Mulgara, formerly known as Kowari, which I haven't used myself yet.
It is a good sign of a healthy software market that more and more commercial triple stores appear as well. While open source is great, many customers prefer to purchase a professional product license so that they have someone to hold accountable and to get help.
Franz, Inc. has been in the software business for quite a while, and is particularly well known as a world-leading provider of Common Lisp products. While Lisp always seems to give the impression of being an academic language that never really made it into the mass market, Franz have optimized their Lisp compiler to astonishing performance. More recently they have started to use their Lisp platform to develop Semantic Web technology solutions.
AllegroGraph is one of their Semantic technology flagship products, and they have done some great progress with it in recent months. From what I have seen, AllegroGraph has really good performance and is now (as far as I know) the best professional RDF triple store on the market. They even have a free entry-level version of AllegroGraph, that scales to up to 50 million triples.
The new
TopBraid Composer 1.5 has full support of AllegroGraph through and optimized Java bridge. This enables TopBraid users to build very large models, and to convert data from other sources into triple stores. "Large" ontologies such as the infamous NCI ontology do not even come close to the orders of magnitude that these guys are working on. While the price tag of both products may not make it an option for everyone, AllegroGraph is certainly a tool to watch, especially if you are interested in an integrated solution that combines some of the best-of-breed solutions from ontology design to deployment.