AP: Just got this remark from one of my pages:
I was asked to look at NULLs in a Database by someone for whom I'd done some performance improvements to their production database, which included making a number of fields nullable. The basis of the argument in the article - that the value of a NULL field is unknown - is completely wrong. We know precisely what the value is. A NULL value in a field means that it "has no value". This is not the same as unknown, unsure, don't know, forgotten, haven't been told, etc... as all of those imply that is does have some value. It doesn't. Space precludes me from giving a full explanation of this concept, but on request I will happily do that plus give examples. But your example of count(available) from accounts belies your lack of understanding of the concept. With three rows in the table of which one is NULL, the count MUST return two, as only two rows have a value. The rules of mathematics state this.Well, well.
Regarding your final points, introducing NULLs does not mean n-valued logic - it means 3-valued logic (three state logic). Neither does it introduce uncertainty. In fact, the reverse is true. I have a recent example where a number field was made not nullable, and held zero when it should have held NULL. The problem was they could not differentiate between records that really had a zero value and those that had just defaulted to zero (and should have been NULL). I have been designing relational databases for twenty years now, and it worries me that such articles get published where a lack of understanding of basic mathematics is shown. People will get the wrong impression, and write bad solutions. The only good thing is that as a freelance, I may get plenty of work.
This is one of the many excellent demonstrations that SQL's NULL is a bad idea: it confuses practitioners over and over and yet they fail to draw the logical conclusion that the confusion is due precisely to the problems of three-valued logic (3VL) and its bad implementation as NULL in SQL (which is itself due to the same).
If "a column does not have some value", there are only two possibilities: either there should be a value, but it is missing because it is unknown, or the value is inapplicable, in which case nothing is missing.
He seems to imply that NULL means inapplicable and that is what 3VL is about. Alas, if that is the case, NULL is inapplicable too: if certain attributes do not apply for some entities, the columns representing those attributes should not apply to the rows representing those entities. In other words, there is an entity supertype-subtype situation and the table was improperly designed and this created a fake need for NULLs.
That is why I agree that we need to worry only about 3VL--true/false/unknown, but that leaves us enough to worry about. By focusing on their representation in tables, but ignoring their implications for integrity enforcement and manipulation most practitioners are lured by NULLs into a false sense of security. An enormous amount of evidence for the problematics involved has been provided by C. J. Date and others and I won't repeat it here (see the references in my The Final NULL in the Coffin paper).
If his case is about unknown, here are some examples:
- Confusion #1: It is not the "NULL field" that is unknown, but the value that ought to reside in that field and is missing.
- Confusion #2: A "NULL value" is a contradiction in terms. Per his own admission the whole point of a NULL is precisely that it is not a value, but rather a mark for the absence thereof. How can "there is no value" mean "we know exactly what the value is"?
- Confusion #3: What is the query to which COUNT(available) is the answer? If the DBMS yields 2, it answers the question "What is the number of accounts with known information"; if it yields 3 it answers the question "What is the total number of accounts?". Which did the user intend, on what grounds does the SQL DBMS decide and does the user know which query was answered? Does he care? He certainly should.
What about those 0's? Well, if they stood for inapplicable, then they were due to poor design, not to non-nullability and repairing the design would correct the mistake. But if they stood for unknown, then all the problematics of 3VL and its bad SQL implementation would kick in.
What is more, SQL's NULL is used for both unknown and inapplicable. This exacerbates the problems by applying a poorly implemented SQL 3VL to a 4VL situation. This would not be a problem if nobody ever misdesigned tables with NULLs representing inapplicable, but it looks like this is exactly what happened in his case and yet he defends NULLs and criticizes others for what he is guilty of.
For an outline of a relational solution to missing data without NULLs, see my above mentioned paper.
Do you like this post? Please link back to this article by copying one of the codes below.URL: HTML link code: BB (forum) link code: