Friday, September 4, 2020

Oldies But Goodies: Relationships and Relations


Note: To demonstrate the stability afforded by a sound foundation relative to the industry's fad-driven cookbook practices, I am re-publishing under "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my debunkings hold up, and whether the industry has progressed beyond the misconceptions they were intended to dispel. I may break long pieces into more pointed parts, and add comments and references to further reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.

“Given the depth and complexity of Codd's thought, not to mention the arcane terms in which he often expressed himself, it is not difficult to grasp why so many of his key points have been widely misunderstood. Even programmers still often misconstrue the technical term "relational". The relational in relational theory refers to relations and not relationships. A relation is a special set of similar objects commonly modeled as entities or as database tables. Relationships may exist between these relations and if your relations are entities you could easily represent the whole thing using a Relational Entity Relationship approach. To elucidate a simple practical example, if you had a company table and an employee table and each company record could have many employee records associated with it, you would have two relations and one relationship. The relations would be the sets of similar objects found in the Employee and Company tables and the relationship would be the association between them. In this case one company to many employees.”
Codd's thought was very deep indeed--new implications are still being derived from his original ideas--and one major objective of relational technology, now almost forgotten, is simplicity. There is little that is complex in relational technology and, in fact, it is the most simple approach possible. Any other general approach is more complex.

It is true that Codd, as a mathematician, did not present his ideas in a way comprehensible to the average practitioner. But it is also true that he had to use different terminology in order to distinguish his precise concepts from the fuzzy, problematic terms already used in the industry. It is also true that, as I argued in the first editorial launching this site, practitioners are so steeped in complex implementation details and devoid of education in fundamentals, that they have a hard time understanding simple logical concepts. It is rather ironic that the author of the article himself reveals some misunderstanding of his own. To clarify:

  • formally a relation is a set of tuples, representing propositions about the real world.
  • informally, a relational table can be viewed as representing an "entity type", with rows representing "entities" of that type.
But note carefully that:
  • "entity" has no precise, formal definition
  • "relationship" can and should be regarded as a special case of "entity"

Comments on re-publication: 
  • A relation is a relationship among domains that is constrained semantically to represent in the database real world relationships within and among entity groups. 
  • We no longer use R-table as a substitute for relation -- it is a visualization of a relation on some physical medium that plays no role in RDM. Note that constraints are not visible in a R-table.
  • A relationship can be (1) among entities within an entity group, in which case it is a collective property of the group and is represented by a constraint or (2) between groups, in which case it is represented by an associative relation.


Further Reading

The Interpretation and Representation of Database Relations (Codd 1969-70)

Logical Symmetric Access, Data Sublanguage, Kinds of Relations, Redundancy and Consistency (Codd 1969-70)

What Relations Really Are and Why They Are Important

Understanding Relations series

Levels of Representation: Conceptual Modeling, Logical Database Design and Physical Implementation

Understanding Conceptual vs. Data Modeling series

Conceptual Modeling Is Not Data Modeling

Relationships and the RDM series

Relations & Relationships

Relationships, Rules, Relations and Constraints


What Is A Database Relationship



Friday, August 28, 2020

TYFK: Denormalization Part 1 - Fundamentals


““Main Question: How do we trade-off while doing denormalization? 
  • Sub-question 1: the standard to implement
- Do we always have to denormalize a model? For what kind of project must we use denormalization techniques while others may not?
- Since denormalization has its gains and losses, how well should we denormalize a data model? Perhaps, the more complete we denormalize, the more complex, uncertain and poor the situation will be.
  • Sub-question 2: the characteristics of normalization
-Does denormalization have several levels/forms the same as that of normalization? For instance: 1DNF, 2DNF...
- Given we can denormalize a data model, it may never be restored to the original one because to do normalization, one can have many ways while to build a data model, you can have multiple choices in determining entities, attributes, etc.””

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

In Part 1 we discuss the relevant fundamentals in which we will ground the debunking in Part 2.

Monday, August 24, 2020

Oldies But Goodies: Data Sublanguage vs. SQL


Note: To demonstrate the stability of a sound foundation relative to the industry's fad-driven cookbook practices, I am re-publishing under "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my debunkings hold up and whether the industry has progressed beyond the misconceptions they were intended to dispel. I may break long pieces into more pointed parts, and add comments and references to further reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.
“Crucial to making the dream of relational databases a reality was the development of a new query language which took significant advantage of relational theory. Then contemporary query languages like CODASYL usually required incredibly complicated code to answer even the simplest of questions from a database. Codd himself developed a relational language called ALPHA that he often used to communicate his ideas in an academic context [Actually, this is not true], but it attracted almost no commercial interest. In 1974 and 1975 Raymond Boyce and Don Chamberlin of IBM designed a new fourth generation language to extract information from systems based on Codd's relational model known as Structured English Query Language, or SEQUEL. This would later be shortened to SQL but is still most correctly pronounced "sequel." As a relational language, SQL was designed to be descriptive rather than procedural. SQL allowed users to specify the results they wanted rather than having to specify the procedure to obtain the desired results. Theoretically this would absolve the user from having to figure out the best way to execute the program and make information available to a whole class of non-programmers. The model's scope and elegance soon drew many converts.”
It is somewhat of a stretch to deem SQL a full fledged data [sub-]language, and I am not sure how significant is the advantage it takes of relational theory. Codd, Date and others, myself included, have amply demonstrated that not only does SQL fail to support, or violates important relational features, but that it is actually a poorly designed language as well. It can be argued that at least one reason relational technology has not reached its potential is that SQL was practically its only concretization (an arguably better language, QUEL, could not compete due to lack of the then critical IBM compatibility). Not only do most practitioners think that SQL DBMSs are relational, but they actually blame the problems due to SQL's violations of, or lack of adherence to relational principles on the relational model itself!

The claim of non-procedurality (the term is declarative, not descriptive) is rather questionable. While SQL is, relatively speaking, less procedural than programming languages, or the quasi-data languages preceding, it is by no means as declarative as it should and could have been and in many cases (particularly those when it eschews relational features) the SQL formulation can be extremely procedural (see, for example, the chapters on normalization, redundancy and integrity in my latest book). One case in point are sub-queries (the "structured" in SQL's name comes from them): they were put in the language to avoid direct expressions of relational operations such as JOIN, INTERSECT, or DIFFERENCE. Unfortunately, they make SQL a highly redundant language--the same query can be expressed in many ways. Codd warned IBM about the optimization implications of such redundancy, but was ignored. In 1983 I ran multiple formulations of the same query against several SQL PC DBMSs and got response times ranging from 2 to 2500 seconds. Ironically, not all direct expressions could be avoided (e.g. UNION); what is more, explicit expression of all relational operations were later added to SQL anyway. So now there is double redundancy and the optimization complications worse!


Further Reading

Data Sublanguage series

Domains, R-tables, and SQL

Not Worth Repeating: Duplicates

To Really Understand Integrity, Don't Start with SQL

NULL Value is a Contradiction in Terms

Precision, Procedurality and SQL

DISTINCT and ORDER BY Are Not Relational
 

Language Redundancy and DBMS Performance: A SQL Story





Thursday, August 20, 2020

TYFK: Relations, Tables, Domains and Normalization


“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized. In normalized relations, values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”


Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

Friday, August 7, 2020

Oldies But Goodies: Data Models and Physical Independence


Note: To appreciate the stability of a sound foundation vs the industry's fad-driven cookbook practice, I am re-publishing some of the articles and reader exchanges from the old DBDebunk.com (2000-06), giving you the opportunity to judge for yourself how well my claims/arguments hold up and whether the industry has progressed at all. I am adding comments on re-publication where necessary. Long pieces are broken into smaller parts for fast reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.

 
“E.F. ("Ted") Codd conceived of his relational model for databases while working at IBM in 1969. Codd's approach took a clue from first-order predicate logic, the basis of a large number of other mathematical systems and presented itself [sic] in terms of set theory, leaving the physical definition of the data undefined and implementation dependent. In June of 1970, Codd laid down much of his extensive groundwork for the model in his article, "A Relational Model of Data for Large Shared Data Banks" published in the Communications of the ACM, a highly regarded professional journal published by the Association for Computing Machinery. Buoyed by an intense reaction against the ad hoc data models offered by the physically oriented mainframe databases, Codd's rigid separation of the logical model, with its rigorous mathematical underpinnings, from the less elegant realities of hardware engineering was revolutionary in its day. Codd and his relational ideas blazed across the academic computing landscape over the next few years.”

Monday, July 20, 2020

Oldies But Goodies: Data Independence and "Physical Denormalization"


Note: I am re-publishing some of the articles and reader exchanges from the old DBDebunk (2000-06). How well do they hold up -- have industry knowledge and practice progressed? Judge for yourself and appreciate the difference between a sound foundation and the fad-driven cookbook approach.


January 2, 2001


ML: ... one of the "4 great lies" is "I denormalize for performance." You state that normalization is a logical concept and, since performance is a physical concept, denormalization for performance reasons is impossible (i.e., it doesn't make sense). What term would you use to describe changing the physical database design to be different from the logical design to enhance performance? Because normalization is a logical concept, you imply that this is not called denormalization.

Friday, July 10, 2020

Oldies But Goodies: Skyscrapers with Shack Foundations

Ed. Note: I am re-publishing some of the posts (slightly refined) from the old dbdebunk (2000-06) to demonstrate how well they hold up to this day, and how not much has improved in the industry -- quite the opposite. The following is the first editorial with which I started dbdebunk in 2000.
 

June 4, 2000

“Well, it's really a judgment call and I think a lot of experience comes into it. It's a little bit like building a shack. Say you want to build a skyscraper, and you started out building a shack and you just keep trying to add onto it. After a while you have this severe structural problem ... So there is a fallacy to the build-upon-a-simple structure approach. Sometimes you get up to three stories and you have to do some major structural changes, and I just accept that.”
--Wayne Ratliffe, developer of dBase
“Client Servers were a tremendous mistake. And we are sorry that we sold it to you. Instead of applications running on the desktop and data sitting on the server, everything will be Internet based. The only things running on the desktop will be a browser and a word processor. What people want is simple, inexpensive hardware that functions as a window on to the Net. The PC was ludicrously complex with stacks of manuals, helplines and IT support needed to make it function. Client server was supposed to alleviate this problem, but it was a step in the wrong direction. We are paying through the nose to be ignorant.”
--Larry Ellison, CEO, Oracle Corp.

View My Stats