Monday, January 16, 2017

Don't Design Databases Without Foundation Knowledge and Conceptual Models




"I have two tables, one is product which is a parent table with one primary key and I have another child table of product, which is a product_details table. But the child table is linking with parent table(product) with logical data instead of foreign key,as we are doing this relationship with the help of java code in the coding side, instead of depending on the data base, which make it as tight couple. To avoid tight coupling between the tables we are storing the primary key value in the child table.
CREATE TABLE `tbl_product` (
  `product_id` varchar(200) NOT NULL,
  `product_details_id` varchar(200) DEFAULT NULL,
  `currency` varchar(20) DEFAULT NULL,
  `lead_time` varchar(20) DEFAULT NULL,
  `brand_id` varchar(20) DEFAULT NULL,
  `manufacturer_id` varchar(150) DEFAULT NULL,
  `category_id` varchar(200) DEFAULT NULL,
  `units` varchar(20) DEFAULT NULL,
  `transit_time` varchar(20) DEFAULT NULL,
  `delivery_terms` varchar(20) DEFAULT NULL,
  `payment_terms` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`product_id`));

CREATE TABLE `tbl_product_details` (
  `product_details_id` varchar(200) NOT NULL,
  `product_id` varchar(200) DEFAULT NULL,
  `product_name` varchar(50) DEFAULT NULL,
  `landingPageImage` varchar(100) DEFAULT NULL,
  `product_description_brief` text CHARACTER SET latin1,
  `product_description_short` text CHARACTER SET latin1,
  `product_price_range` varchar(50) DEFAULT NULL,
  `product_discount_price` varchar(20) DEFAULT NULL,
  `production_Type` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`product_details_id`),
  UNIQUE KEY `product_id` (`product_id`));
Please suggest the Pros and Cons of the design, we are following this kind of relationship in my company, as the manager is saying it will give [us flexibility]. I know that if we lose the data from the table, we can't know the relationship between the two tables."--StackExchange.com

--------------------------------------------------------------------------------
I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so please take advantage of his generosity. Thanks.
--------------------------------------------------------------------------------- 

This is an example of the common problem I so frequently alerted to--database design without a clear and complete specification of a conceptual model. Such cases are indicators of lack of the minimal foundation knowledge required for proper database design and providing advice is a fool's errand. 

Relational design must adhere to the Information Principle (IP) and every database relation must be at least normalized (in 1NF), which means that every relation represents an entity type and the values of its attributes are drawn from simple domains i.e., whose values have no meaningful components. Without the business rules that specify entity types and  properties, there is no basis for design and that precedes constraints, including foreign key (FK) constraints.

Logical-Physical Confusion


Because both PRODUCT_DETAILS_ID and PRODUCT_ID are embedded in both tables, it looks like the tables have a 1:1 relationship, in which case why there's two of them? It appears that the intention is to go from product to product details or vice-versa (the PK-FK relationship already gives that capability--they just don't know how to use it) in Java code and that they are thinking of access as record-at-a-time: there is no provision for multiple detail rows per product and the PK is PRODUCT_DETAILS_ID, not (PRODUCT_ID,PRODUCT_DETAILS_ID). The notion of tight coupling is very confused and ill-analyzed--the embedding makes this design physical: the details aren't needed most of the time so they want to save I/O, which explains the two tables. This is another way of saying that it suffers from logical-physical confusion (LPC).

This is correctly pointed out by the first reply:
"Relational design is not easy [and] requires detailed analysis of the business domain to be modeled, a full understanding of the business rules that define consistent data, and a careful mapping to the logical level of R-Tables. All too often what we see in practice is what I call file based design, where tables in the SQL DBMS are used to represent files whose content is based upon totally ad-hoc design considerations. A clue that the two tables in question--product and product details--were designed using file based design is in the name of the PRODUCT DETAILS table. A file holds details about something. Product details [are not entities of a type]. This is also evidenced by the fact that every one of those columns in that table are defined as NULL."
But relational design is not harder than non-relational design. In fact, it is easier because 
  1. It forces one to know and reason clearly and thoroughly about reality and its representation in the database and
  2. The precision and proper design enable the RDBMS to guarantee data integrity and correctness of query results.
It requires foundation knowledge.

The first reply continues:
"If you want to use relational principles in database design then certainly not. In a relational design, each entity type in the business domain from the conceptual model is represented by an R-Table in the logical model. The R-Table is a table which, when following a specific discipline, can acquire the properties of a mathematical relational which enable the R-Table to be operated upon with logic statements of arbitrary complexity in an algebraic fashion with guaranteed results. Entity types at the conceptual level share a set of properties called attributes which describe them. One or more of these properties is defined to uniquely identify each entity of the class in order to tell them apart. These are mapped to the R-Table at the logical level as columns, and each Entity mapped to a row. Each row then represents a predicate, and when values for the columns are entered the predicate is instantiated to become a proposition.

A key component at the conceptual level are the business rules which define exactly what attribute values constitute a true proposition. These map to constraints in the logical model, which is a key benefit the relational model provides. When created, the constraints enable the DBMS, which can only manipulate symbols, to effectively keep the data entered consistent with the truth of the real world the data is meant to represent."

Relations and R-tables


The LPC is due in part to the common failure to distinguish between tables and relations.

A database relation is a mathematical relation (a specific kind of set) adjusted for applicability to database management. A R-table is just one way to visualize a relation on paper or screen. But just because a relation can be visualized as a R-table does not mean that the data is physically stored as one i.e., a sequential file. It does not have to be and often isn't which is what physical independence (PI) means. As E. F. Codd wrote: "The physical arrangement of columns and rows on a medium is not an essential part of the relational view of data".

The discipline a R-table obeys:
  • Unordered, unique rows without missing values;
  • Uniquely named, unordered columns.
reflects the properties of the relation it represents.

Business Rules, Predicates and Integrity Constraints


Entities of a type share properties which are represented at the logical level by a relation's attributes (visualized as columns). Information about each entity--the set of its property values--is represented by a tuple (visualized as a row). Entities are distinguishable in the real world by a combination of one or more properties and the attribute(s) representing it in the database form a key. Each tuple, thus, represents a fact about an entity which, when formalized as a statement in propositional logic, corresponds not to a relation predicate (RP), but to an instance of the RP--a proposition--such that when values for the attributes replace the variables in the RP, the RP is instantiated to become a (true) proposition. 

Natural language business rules (of which there are several types) that have been formalized in first order predicate logic (FOPL) are referred to as constraints because they constrain the propositions to those that are instances of the RP--they don't violate the rules. The constraints together partially define the RP associated with the relation that represents a corresponding entity type in the real world. Integrity constraints are constraints expressed in a specific (FOPL-based) data language. Thus:

  • Propositions must satisfy the constraints i.e., be instances thereof and true;
  • Tuples must satisfy the integrity constraints expressed in a sufficiently powerful specific data language.
Because a fragment of FOPL is equivalent to simple set theory (SST), statements in a FOPL-based data language can describe relations and express set operations on them (which describe result relations) that can be executed by a RDBMS in response to user query requests. The collection of basic set operations and combinations thereof to which relations are amenable are part of the relational algebra.

Enforcing constraints, including referential constraints by applications (here, Java), rather than by the DBMS in the database is a regress to the pre-relational and even pre-database days, a complex, redundant, error-prone and costly development and maintenance burden. That it is still considered after more than six decades of database management is evidence of the sad state of the database field. Consider the second reply:
"FOREIGN KEYs are not a requirement--they are a convenience. Without them you need a INDEX and you need to either replace the "consistency checking" they provide, or ignore the problem."
It mistates the purpose and necessity of FKs, confuses logical FKs with physical indexes and introduces a further confusion between design with FKs (a structural issue) and consistency checking (an integrity issue). While FKs are necessary for referential integrity, they should arise only in designing the relation and are never to be embedded as a convenience (e.g., for cross references, to reduce I/O, or to support some artificial integrity constraint).

Note: I do not publish anonymous comments. Please stand behind your comments, otherwise don't bother.



No comments:

Post a Comment

View My Stats