From: Perry Valdez
Date: 14 Jun 2005
May I ask, is there a recommended way to design a historical
(temporal) database using SQL products?
(With ad hoc workarounds, maybe?)
I found the following example of a temporal database:
Department:
D# Manager Start
End
___ _______ ________
________
100 Joe 01/01/89 07/09/91
100 Jim 08/09/91 NULL
200 Jane 05/04/88
02/06/90
200 Jack 03/06/90
NULL
Employee:
Employee D# Start
End
________ ___ ________
________
Fred 100 03/05/87
04/02/90
Fred 200 05/02/90
NULL
Result (Join of "Department" and "Employee"
tables):
D# Manager Employee
Start End
___ _______ ________
________ ________
100 Joe Fred 01/01/89 04/02/90
200 Jane Fred 05/02/90
02/06/90
200 Jack Fred 03/06/90 NULL
The NULLs there indicate that the data is still current. But
I'm worried about this kind of database design because of the presence of
NULLs. So what are the better alternatives to represent data that's
still current? Are timestamp columns a better alternative?
It seems that I cannot totally avoid using stored procedures
and triggers in designing temporal databases in SQL. But at least I can still
formulate and implement the overlap constraints there correctly and
"natively" (i.e., without resorting to triggers and stored
procedures), right?
From: Hugh Darwen
The book Temporal Data and The Relational Model doesn't help
you with the current state of the art in SQL, I'm afraid. Rather, we wrote it
to help people express requirements on SQL to the vendors of SQL DBMSs, as well
as to show how the problems of temporal data might be addressed in a truly
relational database language. (SQL is so badly designed, and so
anti-relational, that nobody has yet come up with a good proposal for extending
that particular language to address these problems.)
You need to consider not only how to design the database, but
also how to write queries and express constraints on temporal data. And you are
100% right, in my opinion to worry about those NULLs.
For example, consider the constraint required to express the
fact that no department can have more than one manager at any point in time.
You could call that a temporal counterpart of the regular PRIMARY KEY
constraint, but you'd be hard pushed to express it in standard SQL, and even
harder pushed in most of the SQL products available today.
For another example, consider the query, "For each
department, show the intervals throughout which it was or is without a
manager."
For another example, consider the constraint need to ensure
that (for example) Joe's management of department 100 from 1/1/89 to 7/9/91 is
recorded in exactly one tuple (and not for example in two tuples, one
indicating 1/1/89 to 1/2/89 and the other indicating 1/3/89 to 7/9/91--I assume
US style for dates, with the month first).
I could go on, as we do in the book.
One other point. Most experts agree that the two-column
approach is undesirable, because the designer has to decide how the bounds are
to be interpreted. Are they excluded or included in the intervals during which
the indicated proposition holds true? Better to have INTERVAL data types, such
as INTERVAL(DATE), as explained in our book.
From: Perry Valdez
I found an article titled What
Is A Temporal Database by Chris Klassen. It has many references, but
not including TEMPORAL DATA AND
THE RELATIONAL MODEL. What can you say about it?
The article begins its discussion on the temporal database by
this paragraph:
"We will begin by considering the general question of
what a temporal database is. Here is an image that may prove helpful at the
outset. The classical database is two-dimensional with columns and rows which
intersect each other at cells which contain particular values. Now extend this
flat two-dimensional database into a three- dimensional figure such as a cube.
When you apply this concept to a database, instead of a flat two-dimensional
figure you now have an extended three-dimensional figure with the third
dimension being represented as various time intervals [Tansel, et al.
1993]."
Is it an (in)accurate picture of a temporal database? Because
if the "classical database" refers to a relational database, then the
article's description of it as something that is "two-dimensional
with columns and rows" bothers me, because from what I know, a relation
with N attributes (roughly "columns") is N-dimensional. And it seems
that the article confuses "database" with "table".
But I'm confused as well. If a "classical" relation
has N dimensions, then does the "temporal version" of this relation
have N + 1 dimensions?
Chris Klassen chose to discuss only "the relational
model of the temporal database" because of space limitations, and excluded
the object-oriented, network and hierarchical databases. But according to
him, "Little temporal database research is being done
with respect to the last two types of databases, but an increasing amount of
research is being spent in the area of temporal object-oriented databases. This
corresponds to the general trend of increasing interest in object-oriented
databases." He didn't mention that active research is being done on
temporal relational databases, too.
The article cites two technical definitions of a temporal
database:
The official definition of temporal database is “a database that
supports some aspect of time, not counting user-defined time”
--Richard Snodgrass, 1998
[A temporal database is] a union of two sets of relations Rs and
R1, where Rs is the set of all static relations and R1 is the set of all
time-varying relations.
--Shamkant Navathe and Rafi Ahmed, 1993
But can we say that *all* relations (including the
non-temporal ones) are, in a sense, *time-varying*? Because their contents are
different at different times? Or if ever there is a relation that is not time
varying, then it would be an empty "read-only"
relation?
The article also says, "While 'time-oriented database'
and 'time-varying database' are equivalent in meaning to 'temporal database',
'historical database' is not. As we will see later in this paper, an
'historical database' is actually a subset of the temporal database."
The article discusses the different types of time data:
·
User-defined time (e.g., date/time that is encoded in a
character string)
·
Transaction time (timestamp)
·
Valid time (the actual or real world time at which
point the data is valid)
·
Decision time (the time a specific action occurred or
will occur e.g. the time at which the manager decided to give an employee his
raise)
The article also mentioned the following, which also bothers me:
One of the most important reasons to have a database which
supports the temporal dimension, is the ability to perform ad hoc queries on
the data. The current standard for conventional (relational) databases is
Structured Query Language or SQL. SQL has become the industry standard for
Relational Database Management Systems (RDBMS) because of its ease of use due
to its English-like syntax. SQL is both feasible and user-friendly. The
addition of the temporal element, however, greatly increases the complexity of
the queries on temporal data. With the additional element of time, in its
current form SQL is no longer able to process ad hoc queries as it did on the
(relational) classical database. A new query language or extension to SQL is
necessary.
But it's true that "The addition of the temporal
element, however, greatly increases the complexity of the queries on temporal
data", right?
From: Hugh Darwen
Thank you for drawing my attention to the paper by Chris
Klassen. I think it must have been written in about 1998. None of
the references carries a later date than that and it includes the hopelessly
out-of-date statement: "Currently, a proposal to add TSQL to the existing
ANSI and ISO SQL standards is under consideration by those governing
bodies." I was a member of the ISO "governing" body that
finally rejected the TSQL proposals in 1996 and I was personally responsible
for most of the papers that articulated the objections to this proposal.
I don't find Klassen's paper very useful. It doesn't
describe the problems that need to be described, nor does it say much about the
possible solutions to those problems. As for defining the term, "temporal
database", I see little point in attempting anything precise. I use
the term when I want to
raise the subject, but that's about all. I am much more
interested in pinpointing the real problems and proposing specific solutions.
Regarding dimensions, I fully agree with your comments.
One could say that the time dimension is represented by an appropriate
interval-valued attribute in a relation, that isn't necessarily the third
dimension in particular. In any case, the structure remains
"two-dimensional" in the sense that Klassen apparently means that
term: "... two-dimensional with columns and rows which intersect each
other at cells which contain particular values." Klassen later
mentions "bi-temporal tables". If a table
has two time-interval-valued columns and we are to accept
that adding one such column is to add a dimension, then adding two of them
presumably adds two dimensions. But how does such talk help us to
understand the problems and their possible solutions?
I am well aware of Snodgrass's definition cited by Klassen
and you. I think it is fatuous.
Do you want me to say any more?
Posted 8/19/2005