ON TEMPORAL DATABASE DESIGN IN SQL
with Hugh Darwen

 

 

 

From: Perry Valdez

Date: 14 Jun 2005

 

May I ask, is there a recommended way to design a historical (temporal) database using SQL  products? (With ad hoc workarounds, maybe?)

 

I found the following example of a temporal database:

 

Department:

D#    Manager  Start     End      

___   _______  ________  ________

100   Joe      01/01/89  07/09/91 

100   Jim      08/09/91  NULL     

200   Jane     05/04/88  02/06/90 

200   Jack     03/06/90  NULL     

 

Employee:

Employee  D#    Start     End      

________  ___   ________  ________

Fred      100   03/05/87  04/02/90 

Fred      200   05/02/90  NULL     

 

 

Result (Join of "Department" and "Employee" tables):

D#    Manager  Employee   Start     End       

___   _______  ________   ________  ________

100   Joe      Fred       01/01/89  04/02/90

200   Jane     Fred       05/02/90  02/06/90 

200   Jack     Fred       03/06/90  NULL

 

The NULLs there indicate that the data is still current. But I'm worried about this kind of database design because of the presence of NULLs. So what are the better alternatives to represent data that's

still current? Are timestamp columns a better alternative?

 

It seems that I cannot totally avoid using stored procedures and triggers in designing temporal databases in SQL. But at least I can still formulate and implement the overlap constraints there correctly and "natively" (i.e., without resorting to triggers and stored procedures), right?

 

 

From: Hugh Darwen

 

The book Temporal Data and The Relational Model doesn't help you with the current state of the art in SQL, I'm afraid. Rather, we wrote it to help people express requirements on SQL to the vendors of SQL DBMSs, as well as to show how the problems of temporal data might be addressed in a truly relational database language. (SQL is so badly designed, and so anti-relational, that nobody has yet come up with a good proposal for extending that particular language to address these problems.)

 

You need to consider not only how to design the database, but also how to write queries and express constraints on temporal data. And you are 100% right, in my opinion to worry about those NULLs.

 

For example, consider the constraint required to express the fact that no department can have more than one manager at any point in time. You could call that a temporal counterpart of the regular PRIMARY KEY constraint, but you'd be hard pushed to express it in standard SQL, and even harder pushed in most of the SQL products available today.

 

For another example, consider the query, "For each department, show the intervals throughout which it was or is without a manager."

 

For another example, consider the constraint need to ensure that (for example) Joe's management of department 100 from 1/1/89 to 7/9/91 is recorded in exactly one tuple (and not for example in two tuples, one indicating 1/1/89 to 1/2/89 and the other indicating 1/3/89 to 7/9/91--I assume US style for dates, with the month first).

 

I could go on, as we do in the book.

 

One other point. Most experts agree that the two-column approach is undesirable, because the designer has to decide how the bounds are to be interpreted. Are they excluded or included in the intervals during which the indicated proposition holds true? Better to have INTERVAL data types, such as INTERVAL(DATE), as explained in our book.

 

 

From: Perry Valdez

 

I found an article titled What Is A Temporal Database by Chris Klassen. It has many references, but not including TEMPORAL DATA AND THE RELATIONAL MODEL. What can you say about it?

 

The article begins its discussion on the temporal database by this paragraph:

 

"We will begin by considering the general question of what a temporal database is. Here is an image that may prove helpful at the outset. The classical database is two-dimensional with columns and rows which intersect each other at cells which contain particular values. Now extend this flat two-dimensional database into a three- dimensional figure such as a cube. When you apply this concept to a database, instead of a flat two-dimensional figure you now have an extended three-dimensional figure with the third dimension being represented as various time intervals [Tansel, et al. 1993]."

 

Is it an (in)accurate picture of a temporal database? Because if the "classical database" refers to a relational database, then the article's description of it as something that is  "two-dimensional with columns and rows" bothers me, because from what I know, a relation with N attributes (roughly "columns") is N-dimensional. And it seems that the article confuses "database" with "table".

 

But I'm confused as well. If a "classical" relation has N dimensions, then does the "temporal version" of this relation have N + 1 dimensions?

 

Chris Klassen chose to discuss only "the relational model of the temporal database" because of space limitations, and excluded the object-oriented, network and hierarchical databases. But according to him, "Little temporal database research is being done with respect to the last two types of databases, but an increasing amount of research is being spent in the area of temporal object-oriented databases. This corresponds to the general trend of increasing interest in object-oriented databases." He didn't mention that active research is being done on temporal relational databases, too.

 

The article cites two technical definitions of a temporal database:

 

The official definition of temporal database is “a database that supports some aspect of time, not counting user-defined time”

--Richard Snodgrass, 1998

 

[A temporal database is] a union of two sets of relations Rs and R1, where Rs is the set of all static relations and R1 is the set of all time-varying relations.

--Shamkant Navathe and Rafi Ahmed, 1993

 

But can we say that *all* relations (including the non-temporal ones) are, in a sense, *time-varying*? Because their contents are different at different times? Or if ever there is a relation that is not time varying, then it would be an empty "read-only" relation?

 

The article also says, "While 'time-oriented database' and 'time-varying database' are equivalent in meaning to 'temporal database', 'historical database' is not. As we will see later in this paper, an 'historical database' is actually a subset of the temporal database."

 

The article discusses the different types of time data:

 

·   User-defined time (e.g., date/time that is encoded in a character string)

·   Transaction time (timestamp)

·   Valid time (the actual or real world time at which point the data is valid)

·   Decision time (the time a specific action occurred or will occur e.g. the time at which the manager decided to give an employee his raise)

 

The article also mentioned the following, which also bothers me:

 

One of the most important reasons to have a database which supports the temporal dimension, is the ability to perform ad hoc queries on the data. The current standard for conventional (relational) databases is Structured Query Language or SQL. SQL has become the industry standard for Relational Database Management Systems (RDBMS) because of its ease of use due to its English-like syntax. SQL is both feasible and user-friendly. The addition of the temporal element, however, greatly increases the complexity of the queries on temporal data. With the additional element of time, in its current form SQL is no longer able to process ad hoc queries as it did on the (relational) classical database. A new query language or extension to SQL is necessary.

 

But it's true that "The addition of the temporal element, however, greatly increases the complexity of the queries on temporal data", right?

 

 

From: Hugh Darwen

 

Thank you for drawing my attention to the paper by Chris Klassen.  I think it must have been written in about 1998.  None of the references carries a later date than that and it includes the hopelessly out-of-date statement: "Currently, a proposal to add TSQL to the existing ANSI and ISO SQL standards is under consideration by those governing bodies."  I was a member of the ISO "governing" body that finally rejected the TSQL proposals in 1996 and I was personally responsible for most of the papers that articulated the objections to this proposal.

 

I don't find Klassen's paper very useful.  It doesn't describe the problems that need to be described, nor does it say much about the possible solutions to those problems. As for defining the term, "temporal database", I see little point in attempting anything precise.  I use the term when I want to

raise the subject, but that's about all.  I am much more interested in pinpointing the real problems and proposing specific solutions.

 

Regarding dimensions, I fully agree with your comments.  One could say that the time dimension is represented by an appropriate interval-valued attribute in a relation, that isn't necessarily the third dimension in particular.  In any case, the structure remains "two-dimensional" in the sense that Klassen apparently means that term: "... two-dimensional with columns and rows which intersect each other at cells which contain particular values."  Klassen later mentions "bi-temporal tables".  If a table

has two time-interval-valued columns and we are to accept that adding one such column is to add a dimension, then adding two of them presumably adds two dimensions.  But how does such talk help us to understand the problems and their possible solutions?

 

I am well aware of Snodgrass's definition cited by Klassen and you.  I think it is fatuous.

 

Do you want me to say any more?

 

 

Posted 8/19/2005