Saturday, March 9, 2019

Fourth Order Properties Part 2: Association Relations in Database Design - An Example


Part 1 outlined fundamentals of fourth order properties (4OP) of a multigroup arising from relationships among its group members due to 1:1, M:1, M1:M2, and, generally, M1:M2:M3:...:Mn relationships among the groups' entity members. Fundamentals are commonly missing from database practice, as reflected in the exchange:

“We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”
“Interesting -- I’d never heard this term before. I’ve heard it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”
It's not really introduced because it's way more towards academic than functional.”
--Reddit.com
Knowledge of the fundamentals would have obviated the question, the ad-hoc terminology, and the answers. Note in particular how -- notwithstanding the conceptual and logical nature of the question -- the first answer typically delves directly into implementation[1]. The second answer does not merit attention, except as indicator of the sad state of the industry.

Given a conceptual model, we shall now compare the database design proposed in the question with (1) conventional industry practice that includes "embedded foreign keys", and (2) assuming a true RDBMS, the unified representation using association relations of not just M1:M2, or, generally, M1:M2:M3:...:Mn 4OP relationships, but also the 1:1 and M:1 special cases[2].


Saturday, March 2, 2019

Fourth Order Properties Part 1: Association Relations vs. Foreign Keys


 “We have Building, Room, and Bed entities. Logically, if this is in the scope of some hypothetical hotel, then each one of those entities is dependent on their parent to exist ... you cannot have a bed without a room. Also, that room wouldn't exist without its parent, Building. So, why have I rarely seen this identifying relationship introduced? When I was learning databases, everything was apparently "non-identifying". When is this type of relationship necessary, if at all? I see the issue arises when that BED can exist without a BUILDING. If you were to INSERT into the BED table, you are constraint [sic] to provide a building_id, as the building_id is part of that BED's primary key. Couldn't you avoid an identifying relationship by giving each table its own surrogate primary key? Is this the correct representation  of an identifying relationship? I could avoid that by just giving each table its own ID. At the end of the day, this is about IDENTIFYING relationships, not their existence, which is how I've been logically determining if something is an "identifying relationship" If that were the case, then any 1:N relationship could be "identifying" but that's not how you define identifying or non-identifying.”

“Interesting -- I’d never heard this term before. I’ve hears it referred to as a cached ID though, as that 2nd ID isn’t required, but may be beneficial for performance purposes. For this example with 3 levels it’s not a huge joint statement, but for some systems with 12 tables the joins get unpleasant. I’ve never started a system with this additional id, but I have added one later on once the need was there and the profiling led to this being the best solution for our specific situation. Usually though, just creating a view that does the joins for me has been easier. I’ll be curious what has led others to use this approach.”

“It's not really introduced because it's way more towards academic than functional.”

--Reddit.com

Such questions, and ad-hoc terms like "identifying relationships"[1] come up because practice is driven by intuition and experience (if any), without the benefit of foundation knowledge[2]. Whether practitioners know/like it or not, a database is a formal computable representation of an informal conceptual model[3] and, therefore, data modeling (i.e., logical database design)[4] is impossible without (1) a well-defined and complete conceptual model and (2) a formal data model with which to formalize it as a logical model[5]and the two should not be confused[6]. Otherwise all bets are off.

Here's how foundation knowledge should have informed modeling and design.

Saturday, February 16, 2019

Class, Type, Set, Relvar, and Relation


Note: This is a rewrite of a part of an older post (now redirecting here), to bring into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1] (the rewrite of the other part was posted last week).
“[According to Date] relvar ≠ class. [But i]n simple terms, class applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a relation is a class (and trivially also a set), which contributes to confusion.”

“In modern programming parlance, class is generally distinguished from type only in that the latter refers to primitive (system-defined) data definitions, while class refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, type and class are actually synonymous.”
Class, type, and set are often used interchangeably in the industry. Relations are neither class, nor type, and Date's relvars must be placed properly in their formal context. While details regarding these concepts vary with the flavor of set theory, they are sufficiently well defined to be distinguishable in each of the three formal foundations of the RDM, simple set theory (SST), mathematical relation theory, and first order predicate logic (FOPL).

Sunday, February 10, 2019

Understanding Domains and Attributes


Note: This is a rewrite of one section of an older post (page thereof now links here), to bring it into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1]. The rewrite of the other part will be posted next.
“I don't understand the concepts of domain and attribute in relational database modeling. Can someone give me an effective example?”

“Domain is an overloaded word in the DB lexicon. It probably should also be avoided. When one refers to an attribute domain in practice it is only referring to columns that have a check constraint on them that limit the values. Reference tables with foreign key constraints in general also fulfill the spirit of what domain attributes do outside of an RDBMS.”

“A domain in most SQL usage is essentially an alias name for an existing type + restrictions on an existing type that can be used in a column. As for an attribute, it's essentially a COLUMN in SQL, a field in other types of databases, etc.”
To the extent that practitioners are familiar with domains, they equate them with programming data types (PDT), or, at best, with SQL data types.

Test your foundation knowledge -- are domains the same as PDTs or SQL data types?

Saturday, January 19, 2019

Data and Meaning Part 4: Query and Result Correctness


As we have seen in Parts 1, 2, and 3, the RDM is a formal theory adapted and applied to database management: database relations (1) preserve the formal properties of mathematical relations, but also (2) have interpretations -- carry a real world meaning assigned by a conceptual model: facts about entities, entity groups, and multigroups (i.e., their properties, some of which are relationships, specified by business rules (BR)). A relation is formally in 5NF and constrained for semantic consistency (i.e., to represent facts about an entity group).
“When we create specific domains, relations, and attributes we are constraining (restricting) an abstract logical system to a specific interpretation (meaning). Seen the other way around, an interpretation of the logical system is a representation of a specific segment of the world, and that is exactly the purpose of database design. For example, an attribute name created by the designer is assigned meaning intended by the modeler as representing an entity property, which is the very meaning of semantics. That is why full normalization cannot be achieved or assessed without reference to some conceptual model -- what attribute names mean, and how they are related to each other (i.e., their dependencies), and so on.” --David McGoveran
Yet requesting and giving design advice without a conceptual model is routine in the industry[1]. What is more, most practitioners are oblivious to the implications for correctness of queries and results[2].

Wednesday, January 9, 2019

Data and Meaning Part 3: Database Design


We have seen in Part 2 that the meaning of data in a database is the conceptual model that the database is intended to represent, namely (1) the three types of objects -- entities of multiple types that form entity groups that form a multigroup -- and (2) the business rules (BR) that specify their properties:
  • Properties in context (PiC) shared by entities of each type;
  • Collective group properties (i.e., relationships among entity group members);
  • Multigroup properties (i.e., inter-group relationships).
Often somebody produces one or more tables and asks if there's "anything wrong" with them,  or "if they are in some specific normal form and, if not, how to normalize them". This reflects lack of foundation knowledge. 

Tuesday, January 1, 2019

Data and Meaning Part 2: Types of Business Rules

 
Per Part 1, meaning is captured during conceptual modeling as information about objects of interest, specifically their properties (some of which are relationships), specified in business rules (BR). Because they are expressed informally in natural language, objects and BRs must be formalized into computable form. Data modeling (we prefer logical database design) uses a formal data model to formalize informal conceptual models as formal logical models for database representation: it assigns the meaning in the former to symbols and expressions in the latter[2]. Using the RDM:

  • Objects -- entities, entity groups, and multigroups -- formalize as tuples, relations, and databases, respectively;
  • Properties formalize as domains, and when associated with entities of specific types, as attributes;
  • Group and multigroup properties -- relationships among entities, and among groups[3] -- formalize as constraints on and among relations enforceable by the DBMS.