MORE ON THE “FINAL NULL IN THE COFFIN”
with Fabian Pascal

 

 

 

From: Matt Rogish

To: Editor

Date: 31 May 2005

 

I finally had a chance to really sit down and read The Final NULL in the Coffin foundation paper on the flight to and from Atlanta. It certainly sounds like a good idea ... Thinking about it some more, I had a couple of questions:

 

1a) Constraint/Business rule application: Business rules/constraints would be applied to the "base" relation, correct? And if you wanted to do something like "All employees not on salary cannot work more than 32 hours" you'd basically put some sort of constraint on timecard_entry which was an existence check on sys_employee where attribute = salary and denies the transaction if the timecard_hour > 32?

 

1b) Wouldn't you need the user to specify the sub-relation names in order to effectively query them? Otherwise you end up with employee_1, employee_2 .. employee_16 -- none of which are particularly useful for the user. Otherwise the user must create views for every piece of missing data, e.g.

 

CREATE VIEW unsalaried_emps AS

SELECT EMP# FROM emp

WHERE EXISTS (SELECT *

              FROM sys_employee

              WHERE ... AND attribute = salary)

 

That's not a bad thing per se—it forces you to consider what data can truly be missing—maybe it will remind the database designer to more effectively gather their data.

 

2a) On multi-relations: For a join wouldn't the sub-relation count get out of hand sort of rapidly as the max number of sub-relations would necessarily be a Cartesian product of all subsets of missing data, no? I guess it's another compelling reason to ensure you gather all your data!

 

2b) Would views also have this sub-relation mechanism behind them? e.g. if you create a view on salary and home address (some addresses contain various pieces of inapplicable data, such as apartment #, etc.) would the sub-relations be ... derived from the view or would it be a projection of the base tables' sub-relations? I suppose it really doesn't matter if you have proper view updatability.

 

 

From: Fabian Pascal

To: Matt Rogish

 

The paper gives just the outline of the solution. More research is necessary to flush out the details—

the impact on constraints and operators (see below). Good subject for a PhD thesis.

 

The user applies everything to the table as if it were a relation. If there are missing data, the DBMS maps the constraints and operators to multiple "sub-relations" transparently to the user. Courtesy of relational theory, we believe a TRDBMS can do that.

 

Don't understand the question about time-cards.

 

A TRDBMS would internally keep track of the "sub-relations". Externally the user always works with the "main" relation, never with the "sub-relations", including views (that’s what transparency means).

 

The paper discusses the issue of relation proliferation: given a TransRelational implementation, the "sub-relation" count is not an issue. And yes, there should be a clear incentive to disallow missing values. SQL does not provide one, and lures into a false sense of security with NULLs.

 

Correct: view updatability requires support of constraint inheritance, which would enable a TRDBMS  to do the necessary mapping.

 

 

From: Matt Rogish

 

Ah OK, understood.

 

Yeah, what I wrote doesn't make sense; I meant to say—how do you specify a predicate that says something like "If employee [EMP#]'s salary is unknown, then don't give them benefits [X,Y,Z]"? I guess ... why would you make a predicate that applied only to missing data? Hmm ... It also would seem that most examples I can come up with would be handled with the internal "sub-relation" concept.

 

Given this, then in order to effectively write that rule you'd have to do a lookup on the catalog relation recording missing data information for the main table. So, you would still have to expose the catalog relation? Or, would you (externally) still use something like:

 

SELECT *

FROM employee

WHERE salary IS UNKNOWN?

 

In that case, when the user says

 

SELECT *

FROM employee

 

what does the DBMS return for the attributes with missing values? I thought the DBMS would return the base employee relation for all tuples without missing values then it would return multiple sub-relations—one for each of the particular combination of attributes that existed for a given tuple. So, the application developer would have to explicitly expect more than one result relation?

 

I wasn't speaking to the implementation inefficiency (e.g. 30-thousand tables in SQL != 30-thousand tables in TRM)—but simply managing the logical sub-relations. If I was an application developer with m missing attributes in one relation and n missing attributes in another and I join the two, isn't it possible (depending on the particular query) to end up with m * n sub-relations?

 

Speaking another way, let's say I have (* denotes attributes with missing values):

 

EMPLOYEE (EMP#,*SALARY,*SUPERVISOR#)

PROJECT (PROJECT#,PROJECT_MANAGER#,*PROJECT_NAME,*DEPARTMENT#)

 

And I want a query that gets employee data along with the projects they manage:

 

SELECT emp#,salary,supervisor#,project#,project_name,department#

FROM employee,project

WHERE emp# = project_manager#

 

Wouldn't the necessarily need to generate and return something like 24 (4! factorial) sub-relations to satisfy your query:

 

SUB1 (EMP#,SALARY,SUPERVISOR#,PROJECT#,PROJECT_NAME,DEPARTMENT#)

SUB2 (EMP#,SALARY,SUPERVISOR#,PROJECT#,PROJECT_NAME)

SUB3 (EMP#,SALARY,SUPERVISOR#,PROJECT#)

SUB4 (EMP#,SALARY,PROJECT#)

SUB5 (EMP#,PROJECT#)

SUB6 (EMP#,SALARY,SUPERVISOR#,PROJECT#,DEPARTMENT#)

SUB7 (EMP#,SALARY,PROJECT#,DEPARTMENT#)

SUB8 (EMP#,PROJECT#,DEPARTMENT#)

 

etc.

 

Obviously if the answer to the above is that you deal with the base relation exclusively then it really doesn't matter how many are generated.

 

 

From: Fabian Pascal

 

That would have to be a syntax, not semantics issue. But bear in mind that querying the metadata is not the same as querying the sub-relations directly: querying metadata is not defeating transparency.

 

No, you end up with one table, with the system keeping track of sub-relations. You always deal with the former, system maps to the latter.

 

Precisely. All the drudgery is where it belongs: with the DBMS.

 

 

Posted 7/29/05