Saturday, September 3, 2016

Relation Predicates and Identical Relations



Update: MEANING: IT'S NOT IN THE NAME!

16 comments:

  1. "The two relations have different meanings and, therefore, there should be some attribute or constraint differentiating the RPs. "

    There is, the table name. The table name itself is semantic content. The argument presented here treats the table name as something other than semantic content, which is incorrect. If it is necessary to encode table names and relationships into the database explicitly, then one should do so.

    The VIEWS and DOWNLOAD tables do not have a direct relationship to each other, nor is their relationship orthogonal. They are different semantic features of user behavior. It seems the obvious solution is to treat this behavior semantically in the same table:

    USERACTION (USER_ID, ITEM_ID, ACTIONTYPE);

    Action type being something like V for view and D for download. But it could also expand to include things like U for upload, or S for save-for-later.

    Personally, I find these logic based approaches to database design interfere with the creative process necessary to design databases. The logic approach move us away from treating the data semantically, and forces us towards treating the data syntactically and reductively, which confuses us when we face certain design problems. By treating tables themselves as semantic content, it becomes obvious that the table can be a value. And thus we can have an additional column which captures that table meaning in a superset of the VIEWS and DOWNLOAD rows. this kind of synthetic solution is not an obvious outcome from predicate style logics - unless the database itself becomes part of the predicate system. and as that feels like we might be slipping towards Godel's theorem, I'll stop.

    Thanks for an interesting article.

    ReplyDelete
    Replies
    1. The entire article and its claims are predicated on a very specific interpretation of the Information Principle (that what Date calls an external relvar predicate is itself information too and must therefore be accessible to the DBMS, with all that that entails). With no agreement on that interpretation, surely there's no point in debating/discussing.

      The "external relvar predicate" way of working is known to work well. It was the way of working even in the pre-database era : records in the file named X represent meaning Y.

      What the alternative looks & feels like, I think nobody knows that yet, including DMG himself.

      Delete
    2. There is a formal FOPL relation predicate (RP) which, when expressed in some specific relational data language, is a conjunction of declarative integrity constraints that a RDBMS enforces. The RP has an informal expression in natural language. Date chose to call them "internal" and "external" predicates, but so what? (If you read David's chapters he documents the frequent changes in Date's definitions of the two over time.)

      A relation name is only a shorthand for the RP and the RP surely does not consist of only attribute names as per Date. A RP can capture and symbolize more meaning than just attribute names, including verbs.

      Date just does not like the conceptual level because it's informal. He wants to stick strictly to the logical level, which renders the endeavor too abstract to be useful.

      Delete
    3. You seem to think that what Date calls the external predicate, is just the NL formulation of the internal one. That is wrong.

      You also seem to have arrived at the conclusion that according to Date, what you call the RP (and which he calls "internal predicate"), "consists of only attribute names". Given that you have just stated yourself that your RP is "a conjunction of integrity constraints", it is hard to imagine anyone, let alone Date, believing that such a thing can "consist of only attribute names".

      Delete
    4. How IS the external predicate different than the informal version of the internal one? How CAN it be?

      No, that is not what I said. What I said was that he uses the attribute names as meaning of relations.

      Delete
    5. Here is what you said : "the RP surely does not consist of only attribute names as per Date". That is what you said.

      The external predicate differs from the internal one in that it contains the part that cannot be DBMS-understood, which is why it is called "external", as opposed to the internal one, which is entirely DBMS-understood, which is why it is called "internal".

      Delete
    6. That was just a quick expression, not detailed.

      The DBMS does not have to understand anything in a semantic sense and it doesn't really even those parts that Date thinks it does understand.

      The meaning assigned by the db designer is all FOPL SYMBOLIZED, all the DBMS does is manipulates the symbols consistently. Semantics are applied by users AFTER symbol manipulation to interpret results based on in-system documentation of the symbol meaning which should be accessible to users on demand--something which SQL DBMS's don't do but RDBMSs should and would.

      Delete
  2. As a matter of principle I decided not to respond to anonymous comments. But I will suggest you re-read the article more carefully, with focus on relation and attribute names in the context of the IP and the definition of a RP.

    ReplyDelete
  3. My apologies for the anon. I simply chose the google option.

    I read it a few times, and then typed a response, and then read it again. Your solution:

    "One way to differentiate RPs is to represent the verbs as values of an attribute. Assuming there are only two actions:
    User (USER_ID) performed action (ACTION) on item (ITEM_ID)."

    is obviously the right solution.

    My confusion was in trying to read through and understand what you were saying with so many abbreviations. And when I read your solution it seemed you were dismissing it because it did not satisfy the POOD or it did and POOD was irrelevant...and it violates the... RDM...? oh, relational data model...

    either way, I still feel that a logic approach presents certain problems in design, as axiomatic features (tables) in a FOPL can become new semantic content which should produce a redesign in the database - which seemed to be the posters underlying problem. Because often in software development, we do not know what all the facts are until we start working with an inadequate model.

    -arthole

    ReplyDelete
    Replies
    1. No need to apologize--you are free to post anonymously, but if you want a response, I require to know to whom I am speaking.

      If you introduce yourself I'll reply.

      Delete
    2. I agree with the solution which resolves the false premise of the model: views and downloads are merely different aspects of the same object/entity and nothing on their own which is why they share primary keys.

      I can also agree that Fabian's style can be a little overly academic and that it's easy to trip up over the over the abbreviation ("IP" is used in so many contexts in IT (sic) that it should come with a health warning!) ;-) But in general I love his writing: Practical Issues in Database Management is a fantastic read and that it is out of print tells us more about the inability to think in the industry than it does about the book.

      But I think that the main point that is, that if your conceptual model has ambiguities then your DB schema is going to be a mess. And, as usual, some of the answers provided on things like Stack Overflow are positively harmful: A4 sticks out here! But a lot depends on the question you ask and this should be a question about the conceptual model and not the schema.

      Delete
    3. Charlie,

      When readers focus on style and abbreviations, they signal something about their ability to comprehend the substance.

      I have no idea what is academic about my style. My writing does not contain fluff, it is to the point and I strive to be precise, which is precisely what is missing in most of the published material these days.

      As to abbreviations, I always attach them to the full term first and I use them to save typing when I have to repeat them. There is no possibility to confuse them to other common uses.

      The point of the RDM is to maximize db mgmt. ***by the DBMS***. That is why all information must be represented EXPLICITLY and in EXACTLY one way, which is how much of the simplicity is achieved. Anything that's left only in the mind of the users is trouble.

      Delete
  4. If you want an example of how lack of education on fundamentals can handicap practitioners to they point of inability to comprehend anything beyond tools and products (and even that with limits), then check out the exchange that my post has triggered.

    https://news.ycombinator.com/item?id=12437389&goto=news

    There are a couple of exceptions, but other than that, it's very sad to see the level of intellect in the profession (or, more accurately, lack thereof).

    ReplyDelete
  5. Incidentally, there is criticism of the RDM that it does not capture sufficient meaning, particularly verbs (which is, of course, a misconception due particularly to how Date's treatment of attribute names. But a demonstration that verbs CAN be captured gets objected to with "I have not read this in any book".

    ReplyDelete
  6. Erwin,

    The issue is really how much of the semantics is symbolized in RP's and it is simply false that verbs cannot be.

    ReplyDelete
  7. Regarding the apt complaint "The entire article and its claims are predicated on a very specific interpretation of the Information Principle ... With no agreement on that interpretation, surely there's no point in debating/discussing":

    A frequent interpretation of the IP is that it is to be obeyed so that the intent or casual descriptions of various properties can be reflected either by table names or by attribute values and no more is a said. There's a big difference between a name and a value. Rarely do data designers talk of how to enable user programs to apply the IP.

    In practice, the information available to a user is less than that available to the dbms. This means the dbms has leverage the user doesn't have, specifically the values of relations the user is unaware of but which depend on the user's relations or on which the user's relations depend. The data design can determine how much leverage. Full leverage is only potential until dbms'es exist that reflect such an interpretation.

    Applying adjectives like "natural" to any usage of a database by a dbms is off-base because a dbms doesn't use so-called natural language. Strictly speaking, there is nothing "natural" about a database and all attributes are artificial.

    Most data design dogma is only about query interpretation. It's possible for a user to see a projection without knowing it's a projection and this is not necessarily an illogical requirement, knowing such may not be required. The user who specifies a restriction or who can only see restrictions is not looking at the same relation value the typical data design allows. Knowing a table is a projection is only necessary if the very inefficient attitude is taken that it is users who program the dbms.

    The original question was also about updates. It is logical and should be easy for the dbms to leverage attributes that users can't see, so as to allow certain updates and prevent others, as the data designer sees fit. They might be attributes given values by literals in the catalog, aka logical constants, so as to seem to appear in base tables but that doesn't necessarily mean that those attributes appear in physical tables, Or, they could appear in physical tables which users can't see anyway, at the whim of the dbms developer.

    The tables are just a manifestion, it is really the exact relations that matter, the ones the designer chooses so as to permit some updates and not others.

    ReplyDelete

View My Stats