Friday, December 20, 2019

The RDM and Model Stability





“3rd normal form data models in data warehousing efforts struggle when changes impact parent child relationships. These impacts cause cascading changes to the data model, the queries, and the loading processes. [For example:]
  • There are bank accounts
  • Each account belongs to exactly one customer
  • A customer can have more than one account
The bank introduces a new product: joint accounts, which means an account can now have more than one owner. It is clear that the 3NF model has to be extended in order to keep this new information; the data vault models seems to be able to fulfill the new requirement.

Some banks propose joint accounts, some don’t, therefore some use M:N relation between client and accounts and others 1:N. A model which is good for any possible case is actually awful model because it describes nothing: by looking at this model you can’t say if joint accounts exist among bank's products.”

--Data Vault and Model (in)Stability

Data warehousing/vault[1] are a red herring here -- the real issue is data independence. Some corrections and clarifications first:

  • Normal forms do not pertain to the data model itself -- the RDM -- but to relations in logical models created using strictly the RDM[2].
  • 3NF is insufficient -- relations are in 5NF by definition, otherwise correctness is not guaranteed[3].
  • The RDM was introduced as a database representation superior to old directed graph -- hierarchic and network (CODASYL) -- systems for conceptual models focused on relationships among entity groups, rather than among individual entities[4]. Graph database representation (nodes and edges) corresponds to a worldview at the conceptual level of parents-children (network) relationships, of which parent-children (hierarchy) is a special case. The relational representation (relations) corresponds to M:N relationships among entity groups, of which M:1 is a special case[5].

Note: Correctness -- logical and semantic[6] -- requires adherence to three principles of database design that jointly imply 5NF[7].

View My Stats