Friday, June 14, 2019

Normalization and Further Normalization Part 3: Understanding Database Design




Note: This is a re-write of two older posts, to bring them into line with McGoveran's formalization, re-interpretation, and extension[1] of Codd's RDM.
 

In Part 1 we explained that for a database to be relational, database design must adhere to three core principles, in which case it consists of relations that are by definition in both 1NF and 5NF. In Part 2 we showed that whether tables visualize relations (i.e., are R-tables) can be determined only with reference to the conceptual model that the database designer intended the database to represent (not what any users might think it does). This is obscured by the common and entrenched confusion/conflation of levels of representation and, consequently, of types of model -- conceptual, logical, physical, and data model -- that we have so often debunked[2].


------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE 

Up to 2018, DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 that website was discontinued. The content of this site is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. Thank you.

NEW

  • 05/21/19: Updated the LINKS page.
  • 04/20/19: Added POSTS page with links to all site posts, to be updated monthly.

LATEST PUBLICATIONS (order PAPERS and BOOKS) 



USING THIS SITE 

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics may work.


SOCIAL MEDIA 

I deleted my Facebook account. You can follow me:

  • @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture posts, and my exchanges on LinkedIn.
  • @The PostWest blog: Evidence for Antisemitism/AntiZionism – the only universally acceptable hatred – as the (traditional) response to the existential crisis of decadence and decline of Western (including the US)
  • @ThePostWest Twitter page where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict

------------------------------------------------------------------------------------------------------------------

Note: Confusion of levels of representation is commonly of two types[3]. Conceptual-logical conflation (CLC), for example: 
“The single most important thing, by far, is to do a data design before you do a database design. That is, a true logical data design ... by [t]hinking of "entities", not "tables".”
“... many folks do not get the difference between the Barker entity relationship style of modeling and the relational style of modeling...”
and logical-physical confusion (LPC), for example:
“Apart from normalization, implementation modeling styles have been mainly invented to overcome issues in database software, e.g. current mainstream DBMSes. They make (sometimes serious) compromises on aspects like normalization and integrity against performance and maintenance.”
There is also an awful lot of misuse and abuse of terminology in the industry -- here logical database design is referred to as "data design", and physical implementation as database design (see our terminology below).  

As we argued, the best indicator of poor foundation knowledge in the industry is that explanations that purport to correct any of a plethora of common misconceptions, are themselves plagued by those very misconceptions.
“Part of the ... confusion as to what exactly was meant by "data modeling" -- conceptual, logical or physical -- is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data -- without regard for any technology that might be used to manage it ... a "conceptual data model" ... that represents the business.”
--David Hay, LinkedIn.com
Terms like "conceptual data model" exacerbate confusion[2]. A conceptual model -- Hay is right, practitioners skimp on it (if not skip it altogether) -- must, of course, come first, otherwise what does the database represent (what does the data mean?)[4] This is why whether a table is a R-table can only be determined with reference to the the conceptual model -- what the data mean[5].

But conceptual modeling models business, not data. For database representation (i.e., computability) a conceptual model must be formalized as a logical model using a formal data model[6,7]. It is, thus, database design that models data (think of the conceptual model as the territory, the logical model as a map thereof, the map material -- paper, paint -- as physical model (implementation), and the data model as the map legend)[8,9]. For non-network applications (i.e., focused on relationships among groups of entities, rather than among individual entities), the only formally well-defined data model is the RDM, so "achieving good relational database design" is precisely what "data modeling activities" should focus on[10]. If only.

Here's an example of a mixup of all three levels of representation and corresponding modeling:

“I guess to start we need to know what you mean by a "conceptual model" and a "logical model" and also the difference between an ER model and a relational model. I would also argue that an ORM Fact model is NOT really the same thing. In ORM there is no concept of an entity record (tuple), although relational tables can be automatically generated from an ORM model (furthermore, guaranteed to be fully normalized). The inverse mapping from a relational model to an ORM model is incomplete and ambiguous (unless you assume the relational model is fully normalized, however, it is impossible for that to be known to the system). Thus they are not really the same thing. I would argue that even the act of forming records (i.e., clustering "attributes" into entity records) is a step toward physical implementation." I just want to know what are different approaches you follow for designing a conceptual, logical and physical model.”
--Gordon Everest, LinkedIn.com
  • A conceptual model of some business reality consists of business rules (BRs) -- expressed in specialized natural language understood semantically by users -- that specify defining properties of entities, groups thereof, and a multigroup of groups, some of which arise from relationships[11]. An "E/R model" is a conceptual model produced by Entity-Relationship Modeling (E/RM), variants of which are commonly used in the industry[12].
  • A logical model is a formalization of a conceptual model that can be represented in a database (i.e., a computable form thereof), whereby the BRs specifying properties arising from relationships are expressed in a formal data sublanguage as constraints "understood" algorithmically (i.e., enforceable) by a DBMS to guarantee consistency of the data with the conceptual model[13].
  • A formal data model, which has three essential components -- structure, integrity, and manipulation -- is used to formalize conceptual to logical models for database representation. E/R models can be formalized using the RDM to logical models representable in relational databases, in which case the data language is FOPL-based and relationally complete[14], although we prefer Fact Modeling (see next). Inverse mapping from a relationally designed logical model to its intended conceptual model might be possible -- with some effort -- from a complete specification of database relations and integrity constraints expressed in such a language, but is not from SQL tables and constraints[15].
  • Object-Role Modeling (ORM) is a conceptual modeling approach that we consider superior to E/RM. It is a fact-based approach rooted in linguistics -- a fact model is a conceptual model produced by ORM[16]. At the conceptual level there are neither tuples (logical), nor records (physical), but facts are about objects, namely entities and groups thereof. If a fact model is formalized as a logical model using the RDM with adherence to the three design principles, the result is a relational database consisting of relations in both 1NF and 5NF, the tuples of which are formal database representations of the facts about entity group members specified in the conceptual fact model. Otherwise put, the data is consistent with their meaning.

Note: E/RM is referred to as the "entity-relationship model", and ORM as the "object-role model". To avoid confusion of the modeling approach (E/RM, ORM) with specific applications thereof (E/R models, OR models) we prefer entity-relationship and object-role modeling over model. Similarly, we prefer relationally designed logical models over "relational models", to avoid confusing them with the RDM.

"Clustering attributes into entity records" as a "step toward physical implementation" is an excellent example of confused levels of representation:
  • facts (conceptual) formalize as tuples (logical);
  • properties in context (conceptual) formalize as attributes (logical);
  • tuples are implemented as one or more records (physical).
It is precisely to preempt such confusion that we insist on distinguishing terminology: conceptual (business) modeling, logical database design (i.e., data modeling), and physical implementation (i.e., physical modeling)[17].

Conclusion


Normalization and further normalization are purely for logical repairs of distinct database design flaws. If:

  • A proper conceptual model is formalized to a logical model using the RDM;and,
  • The three core design principles (POOD, POEC, PORP) are adhered to;
the database will be relational (i.e., will consist of relations in both 1NF and 5NF), obviating the need for either repair by normalization to 1NF, or further normalization to 5NF. Note that, as we explained, full normalization (5NF) is necessary but insufficient for relational fidelity: first, 1NF is also required, and second, while adherence to the three principles implies imply the Principle of Full Normalization (POFN), POFN does not imply the three principles. If a database is not relational, all bets are off[18].

Whether a table is a R-table can only be determined with reference to the conceptual model that the database is intended to represent.


Confusion of levels of representation and types of model inhibits understanding of conceptual and data modeling, and is responsible for poor or absent conceptual models and badly designed databases, with consequences to which practitioners are mostly oblivious.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.

References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK (draft chapters), forthcoming.

[2] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model.

[3] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[4] Pascal, F., Data and Meaning Parts 1-4.
http://www.dbdebunk.com/2018/11/data-and-meaning-part-1-rdm-is-applied.html

[5] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[6] Pascal, F., What Is a Data Model, and What It Is Not.

[7] Pascal, F., Understanding Data Modeling Parts 1-5.

[8] Pascal, F., Understanding Conceptual vs. Data Modeling Parts 1-4.

[9] Pascal, F., Data Model: Neither Business, Nor Logical, Nor Physical Model.

[10] Pascal, F., Database Design: What It Is and Isn't.

[11] Pascal, F., Relationships and the RDM Part 1: Kinds of Relationships.

[12]
Pascal, F., E/RM Models Reality, RDM Models Data.

[13] Pascal, F., Relationships, Rules, Relations, and Constraints.

[14] Pascal, F., Natural, Programming and Data Language.

[15] Pascal, F., Normalization and Further Normalization Parts 1-3.

[16] Pascal, F., Understanding Data Modeling Part 4: Fact Modeling.

[17] Pascal, F., Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation.

[18] Pascal, F., Relational Database Design, Logical Validity and Semantic Correctness.




No comments:

Post a Comment