Tech Reports

ULCS-09-003

Hyperset Approach to Semi-structured Databases and the Experimental Implementation of the Query Language Delta (PhD Thesis)

Richard Molyneux


Abstract

This thesis presents practical suggestions towards the implementation of the hyperset approach to semi-structured databases and the associated query language Delta. This work can be characterised as part of a top-down approach to semi-structured databases, from theory to practice.

Over the last decade the rise of the World-Wide Web has lead to the suggestion for a shift from structured relational databases to semi-structured databases, which can query distributed and heterogeneous data having unfixed/non-rigid structure in contrast to ordinary relational databases. In principle, the World-Wide Web can be considered as a large distributed semi-structured database where arbitrary hyperlinking between Web pages can be interpreted as graph edges (inspiring the synonym ‘Web-like’ for ‘semi-structured’ databases also called here WDB). In fact, most approaches to semi-structured databases are based on graphs, whereas the hyperset approach presented here represents such graphs as systems of set equations. This is more than just a style of notation, but rather a style of thought and the corresponding mathematical background leads to considerable differences with other approaches to semi-structured databases. The hyperset approach to such databases and to querying them has clear semantics based on the well established tradition of set theory and logic, and, in particular, on non-well-founded set theory because semi-structured data allow arbitrary graphs and hence cycles.

The main original part of this work consisted in implementation of the hyperset Delta-query language to semi-structured databases, including worked example queries. In fact, the goal was to demonstrate the practical details of this approach and language. The required development of an extended, practical version of the language based on the existing theoretical version, and the corresponding operational semantics. Here we present detailed description of the most essential steps of the implementation. Another crucial problem for this approach was to demonstrate how to deal in reality with the concept of the equality relation between (hyper)sets, which is computationally realised by the bisimulation relation. In fact, this expensive procedure, especially in the case of distributed semi-structured data, required some additional theoretical considerations and practical suggestions for efficient implementation. To this end the “local/global” strategy for computing the bisimulation relation over distributed semi-structured data was developed and its efficiency was experimentally confirmed.

Finally, the XML-WDB format for representing any distributed WDB as system of set equations was developed so that arbitrary XML elements can participate and, hence, queried by the Delta-language.

The query system with the syntax of the language and several example queries from this thesis is available online at

[Full Paper]