Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).
“The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data. To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background. That might be close to how we store filled in paper forms. But it´s so unlike how the mind works. There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.” --Weblogs.asp.net
The lack of foundation knowledge exposed by the above paragraph is so complete that its claims are practically upside down and backwards.
Fundamentals
As
 we have demonstrated, in mathematical set theory a relation (set) is a 
subset of a cross-product of domains (sets). In other words, it is a set
 that is a relationship among sets. Being abstract (i.e., having no 
real world meaning), the values of mathematical relations can be 
arbitrary.
The RDM is an application of simple set theory 
expressible in first order predicate logic (SST/FOPL) to database 
management: a relational database represents a conceptual model of some 
reality, namely (facts about) a multigroup in the real world -- a 
collection of related entity groups -- each database relation 
representing one such group; a database is also a set of related relations. The values in database relations (i.e., the
 data) are, thus, not arbitrary, but must be consistent with the conceptual 
model: relations and the database as a whole are semantically 
constrainted to be so consistent: (1) individual properties of entities 
and (2) collective properties of (a) groups (i.e., relationships among 
entities within groups), and (b) the multigroup (i.e., relationships 
among groups).
A primary key (PK) represents names given in the 
real world to entities of a given type, and the corresponding PK 
constraint (uniqueness) enforces consistency of a relation with the 
distinguishability of those entities in the real world, the facts about which it represents. These are not 
RDM artifacts, but rather part of the adaptation of SST/FOPL to database management.
For the primary advantage of 
the RDM -- guaranteed correctness of query results (i.e., inferences 
made from the database) -- to materialize, logical database design must 
adhere to three core principles which, jointly, imply fully normalized 
relations (5NF). In fact, in RDM relations are in 5NF by definition, 
otherwise they are not relations -- relational algebra (RA) operations lose information and 
all bets are off.
The RA is the manipulative
 component of the RDM -- a collection of primitive and derived set 
operations on relations that describe
relationships among relations. For example, the join operation r1 JOIN 
r2 describes a relationship between r1 and r2 relation, the result itself a relation. Note that since every result of a RA operation on even one relation is always a relation and still describes a relationship -- between the "input" and "output" relations.
A data model -- and, industry claims notwithstanding, 
the only one satisfying Codd's definition that has been formalized is the 
RDM -- is by nature focused on data. However, the RDM supports 
physical independence (PI) and, thus, not concerned with how data is 
physically stored and accessed. The notion of "files stored in paper 
form" is an example of the common and entrenched logical-physical 
confusion (LPC) due to failure to understand the distinction between a 
logical relation and its tabular visualization on a physical medium, 
induced/reinforced by the industry's "direct image" implementation of 
SQL DBMSs.
Conclusion
We rephrase the above paragraph as follows:
“The relational algebra describes relationships among relations (sets). Primary keys are one of the adaptations of the SST/FOPL for database management: a PK constraint -- uniqueness -- represents formally in the database a within-group relationship among all its entities.
Mandatory adherence to three core design principles jointly imply full normalization, which is necessary to guarantees correctness of query results. True RDBMSs:
- Implement the RA for logical data retrieval independent of how the data is physically stored and accessed. SQL DBMSs notwithstanding, vendors are free to store data whichever way they want as long as they don't expose it to users in applications.
- Enforce relational constraints that are formal database representations of relationships in the conceptual model represented by the database.”
The "brain" stuff is sheer nonsense.
 
 
No comments:
Post a Comment