Follow @DBDebunk
Follow @ThePostWest
See: https://www.dbdebunk.com/2018/09/designation-property-and-assertion.html
Showing posts with label Pred. Show all posts
Showing posts with label Pred. Show all posts
Tuesday, September 11, 2018
Sunday, July 15, 2018
Understanding Relations Part 3: Debunking Conventional Wisdom
Follow @DBDebunk
Follow @ThePostWest
(See Part 1 and Part 2)
It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.
(See Part 1 and Part 2)
“A common term used in database design is a "relational database" -- but a database relation is not the same thing and does not imply, as its name suggests, a relationship between tables. Rather, a database relation simply refers to an individual table in a relational database. In a relational database, the table is a relation because it stores the relation between data in its column-row format. The columns are the table's attributes, while the rows represent the data records. A single row is known as a tuple to database designers.”
“A relation, or table, in a relational database has certain properties.”
“First off, its name must be unique in the database, i.e. a database cannot contain multiple tables of the same name.”
“Next ... as with the table names, no attributes can have the same name.”
“Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated. This means that no tuple can have the same unique, primary key. The key cannot have a NULL value, which simply means that the value must be known.”
“Further, each cell, or field, must contain a single value. For example, you cannot enter something like "Tom Smith" and expect the database to understand that you have a first and last name; rather, the database will understand that the value of that cell is exactly what has been entered.”
“Finally, all attributes—or columns—must be of the same domain, meaning that they must have the same data type. You cannot mix a string and a number in a single cell.”
“All these properties, or constraints, serve to ensure data integrity, important to maintain the accuracy of data.” --Definition of Database Relation
It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.
Sunday, June 24, 2018
Understanding Relations Part 1: Tables? So What?
Follow @DBDebunk
Follow @ThePostWest
Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1]
That such a simple and commonly understood structure can visualize relations is an advantage of the RDM, but a table is not a relation and, SQL notwithstanding, confusing the two reflects a lack of understanding of the RDM, misses its significance for database practice, and prevents taking full advantage of its benefits.
Note: The table is the preferred way to picture relations, there are others (e.g., array).
First, the fundamentals.
Note: This is a re-write of two older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation of Codd's real RDM, including his own refinements, corrections, and extensions[1]
“Put simply, a "relation" is a table, the heading being the definition of the structure and the rows being the data.”
“In simple English: relation is data in tabular format with fixed number of columns and data type of each column. This can be a table, a view, a result of a subquery or a function etc.”
“Practically, a "Relation" in relational model can be considered as a "Table" in actual RDBMS products(Oracle, SQL Server, MySQL, etc), and "Tuples" in a relation can also be considered as "Rows" or "Records" in a table.”
“In common usage, however, when someone refers to a "relation" in a database course, they are referring to a tabular set of data either permanently stored in the database (a table) or derived from tables according to a mathematical description (a view or a query result).”
“In SQL RDBMSes (such as MS SQL Server and Oracle] tables are permently stored relations, where the column names defined in the data dictionary form the "heading" and the rows are the "tuples" of the relation. Then from a table, a query can return a different relation.”
“Data is stored in two-dimensional tables consisting of columns (fields) and rows (records). Multi-dimensional data is represented by a system of relationships among two-dimensional tables.”
“I read [that] "Relations are multidimensional. They are not flat. They are not two dimensional. Don't let the term table mislead you." on the back cover of CJ Date's DATABASE IN DEPTH. Can anyone help how to visualize this multidimensional nature of relations?”Because SQL DBMSs have been sold as relational databases (which they are not), and in SQL the data structure is the table, in the absence of foundation knowledge[2] most practitioners think that relational databases consist of tables, but do not ask themselves why and how is that significant for database practice. The subtitle of this post is a question I used to ask in presentations years ago that always got silence. I see no evidence of improvement -- in fact, it's gotten worse. To emulate Feynman, "Nobody understands the RDM".
That such a simple and commonly understood structure can visualize relations is an advantage of the RDM, but a table is not a relation and, SQL notwithstanding, confusing the two reflects a lack of understanding of the RDM, misses its significance for database practice, and prevents taking full advantage of its benefits.
Note: The table is the preferred way to picture relations, there are others (e.g., array).
First, the fundamentals.
Saturday, April 7, 2018
Name the Relational Violation Part 2: Self-defeating Constraint
Follow @DBDebunk
Follow @ThePostWest
Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part 1)
In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications.
Note: This two part series is a rewrite of of an older post (which now links here), to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part 1)
In Part 1 I how several data practitioners failed to pinpoint the relational violation by a a conditional uniqueness constraint that should have been obvious with foundation knowledge. The closest one came was "more than one kind of business entity here [that] share the same properties (not attributes)", but still missed the implications.
Sunday, October 29, 2017
Database Design: What It Is and Isn't
Follow @DBDebunk
Follow @ThePostWest
Revised 10/31/17.
Note: Posts starting with this one will be consistent with the TERMINOLOGY page. Fundamental terms -- the grasp of which is necessary for data management practice -- will be boldened. When you encounter one you don't understand, better find out what it means, chances are it's being misused or abused. Once the page is finalized, labels and, time permitting, old posts may also be revised accordingly.
Reference [9] is an important rewrite and is recommended pre-requisite
for this post that you should read first.
Here's what's wrong with the picture of three weeks ago, namely:
The three design steps above are vague, somewhat confused and obscure more than enlighten. They do not reflect the fact that database design is formalization of a conceptual model of reality as relations constrained to be consistent with the business rules the model consists of.
Revised 10/31/17.
Note: Posts starting with this one will be consistent with the TERMINOLOGY page. Fundamental terms -- the grasp of which is necessary for data management practice -- will be boldened. When you encounter one you don't understand, better find out what it means, chances are it's being misused or abused. Once the page is finalized, labels and, time permitting, old posts may also be revised accordingly.
Reference [9] is an important rewrite and is recommended pre-requisite
for this post that you should read first.
Here's what's wrong with the picture of three weeks ago, namely:
"The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system(DBMS).Many problems in database practice are due to failure to grasp what a data model is and the important distinctions between DBMS functions on the one hand and application functions on the other.
The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:
Within the relational model the final step above can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects." --Halil Lacevic, What is a Relational Database?, Quora.com
- Determine the data to be stored in the database.
- Determine the relationships between the different data elements.
- Superimpose a logical structure upon the data on the basis of these relationships.
The three design steps above are vague, somewhat confused and obscure more than enlighten. They do not reflect the fact that database design is formalization of a conceptual model of reality as relations constrained to be consistent with the business rules the model consists of.
Saturday, May 13, 2017
To Really Understand Integrity, Don't Start with SQL
Follow @DBDebunk
Follow @ThePostWest
Here's what's wrong with the picture of two weeks ago, namely:
SQL inhibits understanding, so if you want to really understand integrity, don't start with SQL. Instead, educate yourself on relational integrity and put your SQL DBMS's features in that context--how correctly and completely does it support all the necessary constraints? Then you can (1) make sure that the constraints that it does support are enforced and (2) be aware of those that it does not, the potential risks thereof and the defensive actions to be taken, if necessary. A bonus is that you will finally realize one of the many important differences between SQL DBMSs and a true RDBMS, which are confused in the industry[1].
I dare you to find this information anywhere else!!!!
Here's what's wrong with the picture of two weeks ago, namely:
"Constraints are categorized as follows:Despite being a critical database function, integrity is insufficiently understood and appreciated. Few practitioners know much beyond just awareness of primary key and referential constraints and question even their necessity.
Constraints are always attached to a column not a table."
- Domain integrity Constraints
- Entity integrity Constraints
- Referential integrity Constraints
- Not null
- Unique
- Foreign key
- Check
- Primary key
--Dayakar, SQL Constraints
SQL inhibits understanding, so if you want to really understand integrity, don't start with SQL. Instead, educate yourself on relational integrity and put your SQL DBMS's features in that context--how correctly and completely does it support all the necessary constraints? Then you can (1) make sure that the constraints that it does support are enforced and (2) be aware of those that it does not, the potential risks thereof and the defensive actions to be taken, if necessary. A bonus is that you will finally realize one of the many important differences between SQL DBMSs and a true RDBMS, which are confused in the industry[1].
I dare you to find this information anywhere else!!!!
Monday, January 16, 2017
Don't Design Databases Without Foundation Knowledge and Conceptual Models
Follow @DBDebunk
Follow @ThePostWest
"I have two tables, one is product which is a parent table with one primary key and I have another child table of product, which is a product_details table. But the child table is linking with parent table(product) with logical data instead of foreign key,as we are doing this relationship with the help of java code in the coding side, instead of depending on the data base, which make it as tight couple. To avoid tight coupling between the tables we are storing the primary key value in the child table.
CREATE TABLE `tbl_product` (
`product_id` varchar(200) NOT NULL,
`product_details_id` varchar(200) DEFAULT NULL,
`currency` varchar(20) DEFAULT NULL,
`lead_time` varchar(20) DEFAULT NULL,
`brand_id` varchar(20) DEFAULT NULL,
`manufacturer_id` varchar(150) DEFAULT NULL,
`category_id` varchar(200) DEFAULT NULL,
`units` varchar(20) DEFAULT NULL,
`transit_time` varchar(20) DEFAULT NULL,
`delivery_terms` varchar(20) DEFAULT NULL,
`payment_terms` varchar(20) DEFAULT NULL,
PRIMARY KEY (`product_id`));
CREATE TABLE `tbl_product_details` (
`product_details_id` varchar(200) NOT NULL,
`product_id` varchar(200) DEFAULT NULL,
`product_name` varchar(50) DEFAULT NULL,
`landingPageImage` varchar(100) DEFAULT NULL,
`product_description_brief` text CHARACTER SET latin1,
`product_description_short` text CHARACTER SET latin1,
`product_price_range` varchar(50) DEFAULT NULL,
`product_discount_price` varchar(20) DEFAULT NULL,
`production_Type` varchar(20) DEFAULT NULL,
PRIMARY KEY (`product_details_id`),
UNIQUE KEY `product_id` (`product_id`));
Please suggest the Pros and Cons of the design, we are following this kind of relationship in my company, as the manager is saying it will give [us flexibility]. I know that if we lose the data from the table, we can't know the relationship between the two tables."--StackExchange.com
Tuesday, December 20, 2016
On View Updating (C. J. Date and D. McGoveran)
Follow @DBDebunk
Follow @ThePostWest
My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.
My recent posts on denormalization [1], identical relations [2] and the POOD [3,4,5] based on D. McGoveran (DMG) interpretation of Codd's RDM, triggered online reactions [6,7] and some comments in place that reflect the current understanding of the RDM. One of my readers referred me back to a 2004 exchange triggered by an exchange @old dbdebunk.com with both CJD and DMG on view updating--an important aspect discussed in my posts--on which the two perspectives differ. Last week I asked what's wrong with CJD's position in the exchange. Here is the original exchange, albeit in abbreviated form (I made minor revisions for clarity and added references to more recent sources the reader may want to consult, such as the 2013 CJD book on the subject [8], which also purports to describe some of DMG's more recent thinking), in which DMG counters with his position and adds a new comment. Throughout, I've changed "relvar predicate" (still used by CJD) to "relation predicate", as preferred in the DMG interpretation.
Sunday, November 20, 2016
The Principle of Orthogonal Database Design Part III
Follow @DBDebunk
Follow @ThePostWest
Note: This is a 11/24/17 re-write of Part III of a three-part series that replaced several previous posts (the pages of which redirect here), to to bring it in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part II)
As we have seen, if relations are uniquely constrained, with a true RDBMS supporting logical independence (LI) and constraint inheritance, database design can adhere to the POOD and enable DBMS-enforced consistency. A RDBMS can also support ESS explicitly.
Industry misconceptions notwithstanding, SQL DBMSs are, of course, not relational. They have weak declarative integrity support, which, coupled with bad database designs, makes adherence to the POOD (as well as the other design principles) difficult. While even its weak relational fidelity was sufficient to render SQL superior to what preceded it, this is but one example of the many advantages of the RDM that SQL has failed to concretize.
Note: This is a 11/24/17 re-write of Part III of a three-part series that replaced several previous posts (the pages of which redirect here), to to bring it in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part II)
POOD and SQL
As we have seen, if relations are uniquely constrained, with a true RDBMS supporting logical independence (LI) and constraint inheritance, database design can adhere to the POOD and enable DBMS-enforced consistency. A RDBMS can also support ESS explicitly.
Industry misconceptions notwithstanding, SQL DBMSs are, of course, not relational. They have weak declarative integrity support, which, coupled with bad database designs, makes adherence to the POOD (as well as the other design principles) difficult. While even its weak relational fidelity was sufficient to render SQL superior to what preceded it, this is but one example of the many advantages of the RDM that SQL has failed to concretize.
Monday, November 7, 2016
The Principle of Orthogonal Database Design Part II
Follow @DBDebunk
Follow @ThePostWest
Note: This is a 11/24/17 re-write of Part II of a three-part series that replaced several previous posts (the pages of which redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part I)
To recall from Part I, adherence to the POOD means independent base relations (i.e., not derivable from other base relations), which the design example in Part I,
Note: This is a 11/24/17 re-write of Part II of a three-part series that replaced several previous posts (the pages of which redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
(Continued from Part I)
To recall from Part I, adherence to the POOD means independent base relations (i.e., not derivable from other base relations), which the design example in Part I,
EMPS (EMP#,ENAME,HIREDATE)violates: EMPS is derivable via union of projections of SAL_EMPS and COMM_EMPS. It requires at least:
SAL_EMPS (EMP#,ENAME,HIREDATE,SALARY)
COMM_EMPS (EMP#,ENAME,HIREDATE,COMMISSION)
- A disjunctive constraint on each of the SAL_EMPS and COMM_EMPS relations, to ensure mutual exclusivity;
- A redundancy control constraint to prevent inconsistency due to partial updates (can you formulate it?);
- Use of the transaction management component of the data language to ensure that each candidate tuple is properly inserted (1) into EMPS and (2) the correct subtype relation;
Monday, September 19, 2016
The Principle of Orthogonal Database Design Part I
Follow @DBDebunk
Follow @ThePostWest
Note: This is a 11/24/17 re-write of Part I of a three-part series that replaced several older posts (the pages of which which now redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
Note: This is a 11/24/17 re-write of Part I of a three-part series that replaced several older posts (the pages of which which now redirect here), to bring in line with the McGoveran formalization and interpretation [1] of Codd's true RDM.
"The principle of orthogonal design (abbreviated POOD) ... is the second of the two principles of database design, which seek to prevent databases from being too complicated or redundant, the first principle being the principle of full normalization (POFN). Simply put, it says that no two relations in a relational database should be defined in such a way that they can represent the same facts. As with database normalization, POOD serves to eliminate uncontrolled storage redundancy and expressive ambiguity, especially useful for applying updates to virtual relations (views). Although simple in concept, POOD is frequently misunderstood ... is a restatement of the requirement that a database is a minimum cover set of the relational algebra. The relational algebra allows data duplication in the relations that are the elements of the algebra. One of the efficiency requirements of a database is that there be no data duplication. This requirement is met by the minimum cover set of the relational algebra." --Wikipedia.orgWell, not quite.
Saturday, September 3, 2016
Wednesday, January 27, 2016
It’s Not Tables, It's the Relationships
Follow @DBDebunk
Follow @ThePostWest
My January post @All Analytics.
"Logical refers to the relationships among the components of the relation, not to any arrangement of the components of a relation. Any presentation that preserves those relationships and adds no extra ones is acceptable. An R-table is one possible such presentation. The problem is that people fixate on this one presentation, identifying it with relation. They then go even further and force the physical implementation of a relation to be table-like." -- David McGoveran
Read it all. (Please comment there, not here)
My January post @All Analytics.
"Logical refers to the relationships among the components of the relation, not to any arrangement of the components of a relation. Any presentation that preserves those relationships and adds no extra ones is acceptable. An R-table is one possible such presentation. The problem is that people fixate on this one presentation, identifying it with relation. They then go even further and force the physical implementation of a relation to be table-like." -- David McGoveran
Read it all. (Please comment there, not here)
Sunday, November 1, 2015
Re-write
Follow @DBDebunk
Follow @ThePostWest
See series starting at:
See series starting at:
The Principle of Orthogonal Database Design Part I
Sunday, November 16, 2014
Wednesday, July 30, 2014
Big Data & Analytics: Table Interpretations
Follow @DBDebunk
Follow @ThePostWest
My July Post @All Analytics
When, for analytical purposes, you combine data extracted from database tables whose real-world meaning you don't know, you're asking for trouble. The meaning, after all, isn't in the tables. Rather, the meaning is contained in the business rules on the basis of which the tables were designed and which are approximated in the database with integrity constraints. Interpreting results on the basis of just visual inspection of the tables therefore involves guesswork and is almost certain to be wrong.
Read it all. (And please comment there, not here)
My July Post @All Analytics
When, for analytical purposes, you combine data extracted from database tables whose real-world meaning you don't know, you're asking for trouble. The meaning, after all, isn't in the tables. Rather, the meaning is contained in the business rules on the basis of which the tables were designed and which are approximated in the database with integrity constraints. Interpreting results on the basis of just visual inspection of the tables therefore involves guesswork and is almost certain to be wrong.
Read it all. (And please comment there, not here)
Sunday, April 27, 2014
UPDATE 2: David McGoveran: Comments on Jim Starkey's "Is the Relational Data Model Spent?"
Follow @DBDebunk
Follow @ThePostWest
UPDATE 1: I have added Jim Starkey's reply to David's initial response (with my brief comments) and David's reply to it below.
UPDATE 2: I have made a few minor corrections and fixed end-note formatting problems.
UPDATE 1: I have added Jim Starkey's reply to David's initial response (with my brief comments) and David's reply to it below.
UPDATE 2: I have made a few minor corrections and fixed end-note formatting problems.
David McGoveran's First Response
© 2014 David McGoveran – All Rights Reserved
© 2014 David McGoveran – All Rights Reserved
Jim Starkey's opinions in Is the relational model spent?, a LinkedIn exchange he initiated, reflect those of many professionals
who have used and even developed SQL DBMSs and their predecessors. While the
concerns with so-called "commercial relational database systems"
expressed by Jim are valid, they have nothing to do with the relational (data)
model. They are the result of DBMS implementations by those who borrowed something
from the relational model, but never understood it and so did not know how to
take advantage of it to solve application problems.
Monday, March 24, 2014
Simplicity: Forgotten, Misunderstood, Underrated Relational Objective
Follow @DBDebunk
Follow @ThePostWest
In a LinkedIn exchange I argued that an optimal generality-to-simplicity ratio (ability to represent a maximal range of reality with minimal complexity) and a 1:1 correspondence between informal business modeling constructs and formal logical database constructs are beneficial. And I claimed that insofar as data models that are formally defined are concerned, the relational model scores best on both.
One of the responses I got was
In a LinkedIn exchange I argued that an optimal generality-to-simplicity ratio (ability to represent a maximal range of reality with minimal complexity) and a 1:1 correspondence between informal business modeling constructs and formal logical database constructs are beneficial. And I claimed that insofar as data models that are formally defined are concerned, the relational model scores best on both.
One of the responses I got was
GE: Though I might come up with slightly different lists, in general, I agree with your expression of criteria for selecting a primary key and of generality and simplicity, but disagree with your conclusion that "RM scores better than any other modeling scheme."Let me take his points one by one.
Sunday, May 12, 2013
Re-write
Follow @DBDebunk
Follow @ThePostWest
See series starting at
See series starting at
The First Normal Form in Theory and Practice Part I.
Sunday, March 3, 2013
Re-write
Follow @DBDebunk
Follow @ThePostWest
See series starting at:
See series starting at:
The Principle of Orthogonal Database Design Part I
Subscribe to:
Posts (Atom)