ON VARIABLES AND VALUES
with Chris Date

 

 

 

From: AD

To: Editor

 

First off, I want to congratulate all of you at dbdebunk.com for providing an invaluable service.  (Valuable at least to those that can appreciate it!)  I myself, having come from an application programming background, was a little rusty on the true power of relational theory, especially given that the existing implementations (i.e., SQL DBMS s) leave so much to be desired.  In fact, connecting databases to set theory was an Aha! That should have occurred to me years ago. Now, after several months of consuming both the website content and several books (while simultaneously developing a data intensive application at work), I m feeling much better about my own understanding of the relational theory. 

 

However, there are some points I am finding myself in disagreement with. I m not convinced that the distinction between variables and values is as pronounced as it has been claimed.  For example, the statement:

 

x := 2

 

is logically equivalent to a statement like:

 

UPDATE storage

SET value = 2

WHERE location = x

 

I.e., variable update is synonymous with tuple update.  The fact that locations may happen to map to memory addresses is purely a physical implementation detail, and has nothing to do with the logical meaning.  In fact, variable assignment means exactly remove one association from the system and add another.  And the association itself is a value.  Perhaps the only thing that could uniquely be considered a variable is the whole system.

 

So if we extend this to more complex objects, we can map the following:

 

anEmployee.setDeparment(aDepartment)

 

to

 

UPDATE employee

SET department_id = aDepartment

WHERE employee_id = anEmployee

 

Claims to the contrary notwithstanding, it is clear that the simplest, most direct, and most complete mapping between Object and Relational systems is:

 

·         class = relvar

·         instance = tuple

·         field = attribute

·         object id = primary key

·         pointer = foreign key

 

To repeat: the fact that locations may happen to map to memory addresses is purely a physical implementation detail, and has nothing to do with the logical meaning.  The biggest failure of object systems is, in fact, the utter lack of a query mechanism and this is the source of all the pointer chasing.  (Please note: nothing written here should be misconstrued as arguments for Objects instead of Relational or any similar nonsense.  I am a firm proponent of relational systems.  My point is that the gulf is not nearly so wide as it is sometimes portrayed.)

 

Further I would argue that introducing domains to relational theory is, in fact, a serious misstep.  The relational model needs no extension.  (Or perversion!)  The clearest indications of this are:

 

·   any complex entity can be represented EITHER by a relation OR by a domain (duplication of functionality)

·   any read-only operator (i.e., a function ) can be replaced by an equivalent relation and a join (duplication of functionality)

·   subtyping by specialization can easily be defined via views

 

Note that some relations may be virtual and predefined (relconsts).  I.e., integer(x) could be a predefined relation.  So could greater_than(x, y), which could also include the built-in constraint greater_than(x, y) -> integer(x) AND integer(y) (or not, if a polymorphic relation was desired).

 

So, there are (at most) only two domains that are required: ID, and STRING (and the second one is only really necessary for a text-based interactive user session in application environments, the system can map other types to and from STRING externally).  And there is one constraint needed: primary keys on type tables are unique across all type tables (this is how strong typing is achieved it could be relaxed slightly to allow inheritance).

 

I think adopting a pure relational model is so much cleaner than rehashing all the same issues with domains.  (In fact, I couldn't even get all the way through THE THIRD MANIFESTO, because I so vehemently disagree with the initial premises.)

 

I would very much appreciate feedback on these points.  I think the database industry is very much in flux right now, and it is imperative that energies be focused in the right places to affect the right kind of change.

 

 

Chris Date Responds: I found this message something of a curate's egg:  Right on several counts, and yet very wrong on several counts too.  I don't think I want to get into a blow-by-blow response here; instead let me just offer the following comments. 

 

Overall, I think Dommasch is thinking "too close to the metal."  It's true, close to the metal, that we have just storage.  It's also true, close to the metal, that whether we treat a particular piece of storage as containing the same thing at all times ("constants") or different things at different times ("variables") is up to us (how we manage that storage).  And so on. (See the article Why 'the Object Model' Is Not a Data Model in my book RELATIONAL DATABASE WRITINGS 1994-1997)

 

But: It seems to me that Dommasch's whole argument is like the argument that says all we need is bits:  We can do everything we want in terms of bits.  Ultimately, of course, this argument is correct, as is demonstrated by the very fact that digital computers exist at all.  But what we do in practice is group useful bundles of bit-based concepts to form higher-level abstractions (e.g., character strings, decimal numbers).  Then we group those again to form still higher-level abstractions (e.g., records, files).  Then we group those again to form still higher-level abstractions (e.g., directories, databases).  Then ... but you get the picture.  Thus, I contend that: 

 

Ø       Types, values, variables, and operators are useful high-level abstractions.  We learned this with Fortran and it's still true today, nearly 50 years later.

 

Ø       In particular, relation types, relation values, relation variables, and relational operators are very useful abstractions.  We learned this with the relational model and it's still true today, over 30 years later.  [Ed. Comment: Weeellll, most have not learned this, even though it’s still true today].

 

High-level abstractions (at least, good ones) are useful because they raise the level of discourse, they allow us to focus on problems more closely related to the real problems of the real world without getting bogged down in irrelevant details, and (just incidentally) they are probably easier to implement efficiently. The entire history of the computing field could be characterized as a search forever higher, ever more useful abstractions. 

 

PS: One assertion of Dommasch's that I disagree with quite strongly is that relations and domains represent "duplication of functionality."  See Appendix C of FOUNDATION FOR FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO, where this nonissue is discussed in some detail.

 

 

Posted 10/04/02

 

 

 

[ABOUT] [QUOTES] [LINKS]