Ga direct naar


Semantic web marvels in a relational database - part II: Comparing alternatives

Monday 15 June 2009 20:09

In this article I will compare the basic technical details of current relational database alternatives.

By Patrick van Bergen

In the first article in this series I explained the relational database mapping of our semantic web implementation. In this article I will place this work into perspective by exploring related techniques.

The last few years developers are looking for ways to overcome certain shortcomings of relational database systems. RDBMSes are general purpose data stores that are flexible enough to store any type of data. However, these are several cases in which the relational model proves inefficient:

  • An object has many attributes (100+), many of which are optional. It would be wasting space to store all these attributes in separate columns.
  • Many attributes with multiple values. Since each of these attributes needs a separate table, the object data will be distributed over many tables. This is inefficient in terms of development time, maintenance, as well as query time.
  • Class inheritance. Since most software is Object Oriented these days the objects in code will need to be mapped to the database structure. In the case of class inheritance, where attributes are inherited from superclasses, it is a big problem to store objects in, and query them from, an RDBMS efficiently.
  • Types and attributes are not objects. In an RDBMS the data of a model is separate from the metadata (attribute names, datatypes, foreign key constraints, etc.). Types and attributes are not like normal objects. This is inefficient in areas where types and attributes need to be added, changed and removed regularly, just like any other data. It is inefficient to write separate code to manipulate and query types and attributes. In short, first order predicate logic no longer suffices for many new applications. The second order is needed.
  • Scalability. Is an aspect often named as the reason to leave RDBMS. However, since relational databases have been optimized for decades, they do scale. Nevertheless, in this age of global, real-time webapplications, techniques provided by RDBMS manufacturers may prove to be inadequate, or simply too expensive.

In the following I will provide a simple understanding of the basic principles of alternative database techniques, along with some pointers to more in-depth information. I hope you will forgive me my non-expert view on these subjects. For detailed information on any given subject, look elsewhere. This article is meant to be just a quick overview, aimed to waken some concepts provided by the examples.

RDBMS, or Row-Oriented database

In a relational database management system, pieces of data are grouped together in a record. In this article I will consider the case where the data stored is meant to represent the attributes of an object. Seen this way, a record is a group of attributes of an object. Here's an example of such a table of objects:

object id color width height name
3 red 100 100 my box
4 green 50 500 old beam


Metadata is shown in gray. Keys / foreign keys are shown in bold typeface.

Need more attributes? Add more columns. Need an attribute with multiple values? Add a table and link it to the first. The RDBMS chooses speed over flexibility. Speed was a big deal 40 years ago, when this database type was designed. And it still is a big deal today. For large amounts of simple data, there is absolutely no need to leave this model.

Semantic net

Storing semantic information as triples is an old idea in the field of Knowledge Representation. As early as 1956, semantic nets were used for this purpose. In this technique the relations between objects are represented by plain labels. Each "record" stores only a single attribute, or one element of an array-attribute. Most notable are the absense of metadata and the fact that object data is distributed over many records.

object id predicate value
3 color red
3 width 100
3 height 100
3 name

my box

4 color green
4 width 50
4 height 500
4 name old beam


Need more attributes? No need to change the table structure. Need an attribute with multiple values? Same thing.

 

 

Entity-Attribute-Value

The Entity-Attribute-Value model of knowledge representation uses some form of triples, just like the semantic web. Its primary use is described by Wikipedia as "Entity-Attribute-Value model (EAV), also known as object-attribute-value model and open schema is a data model that is used in circumstances where the number of attributes (properties, parameters) that can be used to describe a thing (an "entity" or "object") is potentially very vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix."

Attribute metadata is stored in separate attribute tables, which are not triples. EAV is a sort of middle between semantic nets and semantic web: attributes have explicit properties, but these are fixed in amount.

EAV can be used to model classes and relationships as in EAV/CR.

EAV is used in Cloud computing databases like Amazon's SimpleDB and Google's App Engine.

object id attribute id value
3 1 red
3 2 100
3 3 100
3 4

my box

4 1 green
4 2 50
4 3 500
4 4 old beam

 

attribute id name datatype unique
1 color char(6) true
2 width double true
3 height double true
4 name string true


Need more attributes? Add them in the attribute table. Attributes with multiple values? No extra work. The schema of the attributes is stored in the database explicitly, but attributes are treated different from the objects.

Column-Oriented databases

From wikipedia: "A column-oriented DBMS is a database management system (DBMS) which stores its content by column rather than by row."

object id color
3 red
4 green

 

object id width
3 100
4 50

 

object id height
3 100
4 500

 

object id name
3 my box
4 old beam

 

Google's BigTable is based, in part, on column-orientation. Their tables use reversed URI's as object and column identifiers, and have a "third dimension" in that older revisions of the data are stored in the same table.

References:

Correlation databases

A correlation database is "value based": every constant value is stored only once. All these values are stored together, except that values are grouped by datatype. All values are indexed. "In addition to typical data values, the data value store contains a special type of data for storing relationships between tables...but with a CDBMS, the relationship is known by the dictionary and stored as a data value."

I have not found a clear example of what this datastructure looks like, but we can infer that the internal structure must look something like the following. Note: I may be completely wrong here!

The values-table (actually there is one table per major datatype; i.e. integers, strings, dates, etc.)

value id value
1 red
2 green
3 100
4 50
5 500
6 my box
7 old beam
8 <object 1>
9 <object 2>
10 <relationship color>
11 <relationship width>
12 <relationship height>
13 <relationship name>

 

and then there is at least a table containing the relationships (or: "associations") between the values. The relationships are stored as values themselves:

value id 1 association value id 2
8 10 1
8 11 3
8 12 3
8 13 6
9 10 2
9 11 4
9 12 5
9 13 7


References:

Hierarchical model, Network model, Navigational database

For the sake of completeness I have to name these models. The hierarchical model stores tree-like structures only, requiring each piece of data to have a single "parent". The network model allows a piece of data to have multiple parents. Both models were superseded by the relational model, but they are still used for special-purpose applications. A navigational database allows to traverse such trees / DAGs by following paths.

 

 

Object-Oriented databases

In an object-oriented database all attributes of a class are stored together. From what I've read on the internet I conclude that the actual storage structure of an OODBMS is sort of an implementation detail. This means that performance characteristics of the database will depend heavily on the type of implementation chosen. Development of this model was first in the hands of the ODMG, but control was transferred to the Java Community Proces that build the Java Data Objects specification. This specification names the conditions for such a database, but does not guide the implementation.

Some special properties:

  • Class inheritance is supported in the data model.
  • Object nesting: an object can contain (not just link to) other objects

Mapped to an RDBMS, a so called ORM (Object Relational Mapping), objects are commonly stored in a standard relational way: one column per (single valued) attribute. To implement inheritance, the columns of all base classes of an object are joined. This can be done at design-time (create a big table containing the columns of all parent classes) or at query-time (join parent class tables).

class id object id color width height name
101 3 red 100 100 my box
101 4 green 50 500 old beam

 

class id class name parent class
101 Object  
102 Bar 101


References:

Document based databases

A document based database is a different beast altogether. It lacks a database schema completely, and a complete object is stored in a single cell. In the case of CouchDB, this is done by encoding the object (or: document) in JSON. Real-time querying of the source table is thus impossible, one needs to create views on the data.

object id document
3 {"color":"red","width":100,"height":100,"name":"my box"}
4 {"color":"green","width":50,"height":500,"name":"old beam"}


References:

Triplestores

Some triplestores are publicly available. Commonly they have an RDF interface. Their performance can be measured using the Lehigh University Benchmark (LUBM). The most advanced open source triplestores are Sesame, and ARC.

object id attribute id value
3 101 red
3 102 100
3 103 100
3 104 my box
4 101 green
4 102 50
4 103 500
4 104 old beam
101 104 color
102 104 width
103 104 height
104 104 name

 

Very little has been made public about the way triplestores are implemented in a relational database. A laudable exception to this is the Jena2 database schema. Unfortunately, the schema appears to be very inefficient, since the URIs are not indexed but are used literally.

A charmingly simple implementation that seems resource intensive was made for expasy4j: triples are stored in a single table, but for query speed, a single column is reserved for each separate datatype.

Another, somewhat better implementation was made for OpenLink Virtuoso: it uses indexed uris, but all constants are placed in a single field datatyped "ANY".

Conclusion

I hope this article has shown you a little bit why developers are looking for alternatives for the familiar RDBMS and which forms these currently have taken. Currently the field is quite diverse and developments are being made by many different parties. It will be interesting to see how this evolves and which alternative(s) will eventually become the successor of the relational database.

« Back

Reactions on "Semantic web marvels in a relational database - part II: Comparing alternatives"

Danny
Placed on: 06-15-2009 23:01
It looks like graph database are a kind of navigational database. These seem to get quite a bit of hype lately. It's not hard to see why, graph databases are very easy to understand and use, and offer a large amount of flexibility. As far as I can see, it can be used in almost any situation where an RDBMS is currently used.
Eli Konky
Placed on: 06-16-2009 12:23
Main issue with non "row oriented" rdbms implementation is the need to implement the whole query language - meaning you need to translate a conceptual query into an SQL query. This task is not that easy when using a non row oriented implementation. couchdb solved it, as mentioned in the text, using views.

Log in to comment on news articles.

Procurios zoekt PHP webdevelopers. Werk aan het Procurios Webplatform en klantprojecten! Zie http://www.slimmerwerkenbijprocurios.nl/.


Hello!

We are employees at Procurios, a full-service webdevelopment company located in the Netherlands. We are experts at building portals, websites, intranets and extranets, based on an in-house developed framework. You can find out more about Procurios and our products, might you be interested.

This weblog is built and maintained by us. We love to share our ideas, thoughts and interests with you through our weblog. If you want to contact us, please feel free to use the contact form!


Showcase

  • Klantcase: Bestseller
  • Klantcase: de ChristenUnie
  • Klantcase: Evangelische Omroep
  • Klantcase: de Keurslager
  • Klantcase: New York Pizza
  • Klantcase: Verhage

Snelkoppelingen