From: Matt Rogish
To: Editor
Date: 31 May 2005
I finally had a chance to really sit down and read The Final NULL in the
Coffin foundation paper on the flight to and from Atlanta. It certainly
sounds like a good idea ... Thinking about it some more, I had a couple of
questions:
1a) Constraint/Business rule application: Business
rules/constraints would be applied to the "base" relation, correct?
And if you wanted to do something like "All employees not on salary cannot
work more than 32 hours" you'd basically put some sort of constraint on
timecard_entry which was an existence check on sys_employee where attribute =
salary and denies the transaction if the timecard_hour > 32?
1b) Wouldn't you need the user to specify the sub-relation
names in order to effectively query them? Otherwise you end up with employee_1,
employee_2 .. employee_16 -- none of which are particularly useful for the user.
Otherwise the user must create views for every piece of missing data, e.g.
CREATE VIEW unsalaried_emps AS
SELECT EMP# FROM emp
WHERE EXISTS (SELECT *
FROM
sys_employee
WHERE ...
AND attribute = salary)
That's not a bad thing per se—it forces you to consider what
data can truly be missing—maybe it will remind the database designer to more
effectively gather their data.
2a) On multi-relations: For a join wouldn't the sub-relation
count get out of hand sort of rapidly as the max number of sub-relations would
necessarily be a Cartesian product of all subsets of missing data, no? I guess
it's another compelling reason to ensure you gather all your data!
2b) Would views also have this sub-relation mechanism behind
them? e.g. if you create a view on salary and home address (some addresses
contain various pieces of inapplicable data, such as apartment #, etc.) would
the sub-relations be ... derived from the view or would it be a projection of
the base tables' sub-relations? I suppose it really doesn't matter if you have
proper view updatability.
From: Fabian Pascal
To: Matt Rogish
The paper gives just the outline of the solution. More
research is necessary to flush out the details—
the impact on constraints and operators (see below). Good
subject for a PhD thesis.
The user applies everything to the table as if it were a
relation. If there are missing data, the DBMS maps the constraints and
operators to multiple "sub-relations" transparently to the user.
Courtesy of relational theory, we believe a TRDBMS can do that.
Don't understand the question about time-cards.
A TRDBMS would internally keep track of the
"sub-relations". Externally the user always works with the
"main" relation, never with the "sub-relations", including views
(that’s what transparency means).
The paper discusses the issue of relation proliferation:
given a TransRelational
implementation, the "sub-relation" count is not an issue. And
yes, there should be a clear incentive to disallow missing
values. SQL does not provide one, and lures into a false sense of security with
NULLs.
Correct: view updatability requires support of constraint
inheritance, which would enable a TRDBMS
to do the necessary mapping.
From: Matt Rogish
Ah OK, understood.
Yeah, what I wrote doesn't make sense; I meant to say—how do
you specify a predicate that says something like "If employee [EMP#]'s
salary is unknown, then don't give them benefits [X,Y,Z]"? I guess ... why
would you make a predicate that applied only to missing data? Hmm ... It also
would seem that most examples I can come up with would be handled with the
internal "sub-relation" concept.
Given this, then in order to effectively write that rule
you'd have to do a lookup on the catalog relation recording missing data
information for the main table. So, you would still have to expose the catalog
relation? Or, would you (externally) still use something like:
SELECT *
FROM employee
WHERE salary IS UNKNOWN?
In that case, when the user says
SELECT *
FROM employee
what does the DBMS return for the attributes with missing
values? I thought the DBMS would return the base employee relation for all
tuples without missing values then it would return multiple sub-relations—one
for each of the particular combination of attributes that existed for a given
tuple. So, the application developer would have to explicitly expect more than
one result relation?
I wasn't speaking to the implementation inefficiency (e.g. 30-thousand
tables in SQL != 30-thousand tables in TRM)—but simply managing the logical
sub-relations. If I was an application developer with m missing attributes in
one relation and n missing attributes in another and I join the two, isn't it
possible (depending on the particular query) to end up with m * n
sub-relations?
Speaking another way, let's say I have (* denotes attributes
with missing values):
EMPLOYEE (EMP#,*SALARY,*SUPERVISOR#)
PROJECT (PROJECT#,PROJECT_MANAGER#,*PROJECT_NAME,*DEPARTMENT#)
And I want a query that gets employee data along with the
projects they manage:
SELECT emp#,salary,supervisor#,project#,project_name,department#
FROM employee,project
WHERE emp# = project_manager#
Wouldn't the necessarily need to generate and return something
like 24 (4! factorial) sub-relations to satisfy your query:
SUB1 (EMP#,SALARY,SUPERVISOR#,PROJECT#,PROJECT_NAME,DEPARTMENT#)
SUB2 (EMP#,SALARY,SUPERVISOR#,PROJECT#,PROJECT_NAME)
SUB3 (EMP#,SALARY,SUPERVISOR#,PROJECT#)
SUB4 (EMP#,SALARY,PROJECT#)
SUB5 (EMP#,PROJECT#)
SUB6 (EMP#,SALARY,SUPERVISOR#,PROJECT#,DEPARTMENT#)
SUB7 (EMP#,SALARY,PROJECT#,DEPARTMENT#)
SUB8 (EMP#,PROJECT#,DEPARTMENT#)
etc.
Obviously if the answer to the above is that you deal with
the base relation exclusively then it really doesn't matter how many are
generated.
From: Fabian Pascal
That would have to be a syntax, not semantics issue.
But bear in mind that querying the metadata is not the same as querying the
sub-relations directly: querying metadata is not defeating transparency.
No, you end up with one table, with the system keeping
track of sub-relations. You always deal with the former, system maps to the
latter.
Precisely. All the drudgery is where it belongs: with the
DBMS.
Posted 7/29/05