DATABASE DEBUNKINGS: LPC

Showing posts with label LPC. Show all posts

Monday, March 5, 2018

Physical Independence Part 1: Don't Mix Model with Implementation

Follow @DBDebunk Follow @ThePostWest

Note: This is a rewrite of several older posts (which now link here), to bring them into line with the McGoveran formalization and interpretation [1] of Codd's true RDM.

"You constantly remind us that the relational model is a logical model having no connection to any physical model (so I infer). You also indicate how no commercial product fully implements the relational model. Therefore, how do we make use of the relational model when dealing with the physical constructs of a commercial database program (Oracle, Access, DB2, etc.)?" --DBDebunk.com query

DBMS for Analytics: Risky Business Without Foundation Knowledge, Part 2

Follow @DBDebunk Follow @ThePostWest

Note: This was originally posted at AllAnalytics, which no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.

In this two-part series I alert analysts that correct interpretation and assessment of media/industry claims without being misled requires a good grasp of data fundamentals. In Part 1, I discussed the logical-physical confusion and the erroneous missclassification of DBMSs as relational and non-relational underlying the argument that the latter are superior to the former for analytics applications. In Part 2, I discuss a third misconception behind the claim.

DBMS for Analytics: Risky Business Without Foundation Knowledge (Part 1)

Follow @DBDebunk Follow @ThePostWest

Note: This was originally a post at AllAnalytics, which is no longer exists, so some links to other posts there no longer work, but I left them in to alert the reader that I have written on those specific subjects. Other links work.

A new study finding that "non-relational database management systems now comprising 70% of analytics data sources" attributes their popularity to "superiority" over RDBMSs in satisfying analytics needs. There are good reasons to be skeptical of such findings, but even if this one were true, the arguments advanced in support of the claim are rooted in the usual misconceptions due to poor foundation knowledge debunked by this blog. Let's see.

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Follow @DBDebunk Follow @ThePostWest

Re-write 10/16/18

“E/RM is a data model -- So says Date, Chen, etc. So says the majority of current industry experts ... With very strong references to Codd (who he worked with), Date elegantly explains the differences between RM and E/RM -- but clearly believes both are data models (even allowing for the charitable comment). If we take a RDB as the ultimate target implementation of data, and an E/RM (or extended) can correctly design all the artifacts that are implemented, this means it is modeling the data. Granted, an E/RM does not explicitly model some of the non-structural aspects of the original Codd definition.”

“Out of interest, is there a common Relational Modeling tool, that is not also an E/RM tool and models the full Codd definition? There are also several other methods of modeling data -- E/RM is more a mechanism to represent the data. If E/RMs are used by IT professionals across the world to direct the design and build of the majority of applications guided by standard methodologies, is the view of this argument that these were all build wrongly? Regardless of success? Is the inferred conclusion that only the RM models data, and ERM, [or] any other techniques do not? [If so] that is a little limiting.”

Objects, Properties, and Ontological Commitment

We are culturally and linguistically conditioned to conceptualize the world as objects with properties. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from objects that are not and do not. Applying a class definition to the universe selects out the group of objects of that type from the universe.

Philosophical ontology is the study of being, existence, reality, as well as the basic categories of being, and their relationships -- what entities exist or may be said to exist, and how they may be grouped, related, and subdivided according to similarities and differences.

Note: 'Object' is used in the general, not OO sense. Ontology, as used herein, should not be confused with "computer science ontology", whereby the term ontology was usurped, and is understood by programmers as meaning a conceptual graph of directed semantic relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of objects of interest, and (2) formulates business rules (BR) that specify their properties and relationships and, as such, makes an ontological commitment. Any approach to conceptual modeling must consider the ontological commitment upon which it is based, which has major implications for the data model used to formalize conceptual models as logical models for computable database representation -- it must be consistent with that commitment.

Unfortunately, due to lack of foundation knowledge in the industry[1], practitioners -- both vendors and users -- are largely unaware of, and oblivious to ontological underpinning and their implications for database technology and practice, one reason why they not only stagnated, but regressed in the last five decades. In this multipart series we explain the important distinction between conceptual, and data modeling (aka logical database design), which requires a formal data model. The E/RM is not, and while it can be used for conceptual modeling of reality, not data, we outline a new conceptual modeling approach that makes a different ontological commitment and requires adjustments to the RDM, both necessary for genuine progress.

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness

Follow @DBDebunk Follow @ThePostWest

Note: This is a 8/24/17 rewrite of a 5/20/13 post to bring it in line with McGoveran's formal exposition of Codd's RDM [1] and its correct interpretation.

08/25/17: I have added formal definitions of logical validity and semantic correctness.
09/01/17: Minor revisions.
09/02/17: Added references.
03/15/18: Minor revisions.

Here's what's wrong with last week's picture, namely:

"In my experience, using an object model in both the application layer and in the database layer results in an inefficient system. This are my personal design goals:
- Use a relational data model for storage
- Design the database tables using relational rules including 3rd normal form
- Tables should mirror logical objects, but any object may encompass multiple tables- Application objects, whether you are using an OO language or a traditional language using structured programming techniques should parallel application needs which most closely correspond to individual SQL statements than to tables or "objects". --LinkedIn.com

Structure, Integrity, Manipulation: How to Compare Data Models

Follow @DBDebunk Follow @ThePostWest

The IT industry operates like the fashion industry: every few years -- and the number keeps getting smaller -- a "new" data technology pops up, with vendors, the trade media and various "experts" all stepping over each other to claim that it'll "revolutionize your business" and unless you jump on the bandwagon, you'll be "left behind." But time and again these prove to be fads lacking a sound foundation. Huge resources are invested in migrations from fad to fad, rather than in productive work (Don't believe the hype about Hadoop usage, Basta, Big Data It's Time to Say Arrivederci). Remember?

"Hadoop seems to take over relational database, as Hbase can store even unstructured data whereas relational data warehouse limits to structured data ... handles traditional structured data just fine, albeit in a different way than a RDBMS ... EDW vendors [will] incorporate Hadoop framework into their core architectures to enable advanced and high performance analytics."

What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency

Follow @DBDebunk Follow @ThePostWest

Note: This is a 10/23/17 rewrite of a 7/29/12 post to bring it in line with the McGoveran interpretation [1] of Codd's true RDM.

To understand what's wrong with the picture of two weeks ago, namely:

"If we step back and look at what RDBMS is, we’ll no doubt be able to conclude that, as its name suggests (i.e., Relational Database Management System), it is a system that specializes in managing the data in a relational fashion. Nothing more. Folks, it’s important to keep in mind that it manages the data, not the MEANING of the data! And if you really need a parallel, RDBMS is much more akin to a word processor than to an operating system. A word processor (such as the much maligned MS Word, or a much nicer WordPress, for example) specializes in managing words. It does not specialize in managing the meaning of the words ... So who is then responsible for managing the meaning of the words? It’s the author, who else? Why should we tolerate RDBMS opinions on our data? We’re the masters, RDBMS is the servant, it should shut up and serve. End of discussion." --Alex Bunardzic, Should Database Manage The Meaning?

it helps to consider the quote in the context of another article by the author, "The Myth of Data Integrity", where he reveals that those "DBMS opinions" are constraints (the article has been deleted, but a few comments remain online and are highly recommended for a feel of the consequences of lack of foundation knowledge).

Don't Design Databases Without Foundation Knowledge and Conceptual Models

Follow @DBDebunk Follow @ThePostWest

"I have two tables, one is product which is a parent table with one primary key and I have another child table of product, which is a product_details table. But the child table is linking with parent table(product) with logical data instead of foreign key,as we are doing this relationship with the help of java code in the coding side, instead of depending on the data base, which make it as tight couple. To avoid tight coupling between the tables we are storing the primary key value in the child table.

CREATE TABLE `tbl_product` (
`product_id` varchar(200) NOT NULL,
`product_details_id` varchar(200) DEFAULT NULL,
`currency` varchar(20) DEFAULT NULL,
`lead_time` varchar(20) DEFAULT NULL,
`brand_id` varchar(20) DEFAULT NULL,
`manufacturer_id` varchar(150) DEFAULT NULL,
`category_id` varchar(200) DEFAULT NULL,
`units` varchar(20) DEFAULT NULL,
`transit_time` varchar(20) DEFAULT NULL,
`delivery_terms` varchar(20) DEFAULT NULL,
`payment_terms` varchar(20) DEFAULT NULL,
PRIMARY KEY (`product_id`));

CREATE TABLE `tbl_product_details` (
`product_details_id` varchar(200) NOT NULL,
`product_id` varchar(200) DEFAULT NULL,
`product_name` varchar(50) DEFAULT NULL,
`landingPageImage` varchar(100) DEFAULT NULL,
`product_description_brief` text CHARACTER SET latin1,
`product_description_short` text CHARACTER SET latin1,
`product_price_range` varchar(50) DEFAULT NULL,
`product_discount_price` varchar(20) DEFAULT NULL,
`production_Type` varchar(20) DEFAULT NULL,
PRIMARY KEY (`product_details_id`),
UNIQUE KEY `product_id` (`product_id`));

Please suggest the Pros and Cons of the design, we are following this kind of relationship in my company, as the manager is saying it will give [us flexibility]. I know that if we lose the data from the table, we can't know the relationship between the two tables."--StackExchange.com

The Costly Illusion of Denormalization for Performance

Follow @DBDebunk Follow @ThePostWest

My October post @All Analytics.

Be that as it may, practitioners insist that performance improves when they denormalize databases, because "bundling" facts into less relations reduces joins. But even if this were always true -- it is not -- performance gains, if any, do not come from denormalization per se, but from trading off integrity for performance. What many data professionals miss is that the redundancy introduced by denormalization must be controlled by the DBMS to ensure data integrity, which requires special integrity constraints that, it turns out, involve the very joins that denormalization is intended to avoid, defeating its purpose. These constraints are practically never declared and enforced, which creates the illusion that denormalization improves performance at no cost.

Read it all. (Please comment there, not here)

Sunday, June 12, 2016

Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation

Follow @DBDebunk Follow @ThePostWest

From last week:

What's wrong with this picture? (Kinds of Data Models, LinkedIn.com)

David Hay: "Part of the ... confusion as to what exactly was meant by “data modeling”--conceptual, logical or physical--is that most data modeling activities seem to focus on achieving good relational database designs ... my approach is the portrayal of the underlying structure of an enterprise’s data--without regard for any technology that might be used to manage it ... a “conceptual data model” ... that represents the business."

Nothing raises uncertainty whether to laugh or cry better than attempts to dispel confusion which suffer from the very confusion they purport to dispel.

NoSQL and SQL: A Plague on Both Their Houses

Follow @DBDebunk Follow @ThePostWest

Oracle Defends Relational DBs Against NoSQL Competitors prompted Does Oracle Really Understand NoSQL?, which was shared on LinkedIn and triggered a LinkedIn exchange in which I participated. The following comment is an adequate summary of the second article:

JN: It's an unfortunate bit of propaganda. Some truth mixed in with distractions and irrelevant comparison. I've met the DataStax team. They're smart people with a solid understanding of their space. I'm disappointed to see them mix good and bad information into something that looks like objective truth.

and the first is not much better. The interested reader can visit all three links. What I want to do here is amplify on some of my LinkedIn comments and add some.

Data Warehouses and the Logical-Physical Confusion

Follow @DBDebunk Follow @ThePostWest

(Erwin Smout is co-author of this post.)

Revised 8/26/18

In Implementation Data Modeling Styles Martijn Evers writes:

"Business Intelligence specialists are often on the lookout for better way to solve their data modeling issues. This is especially true for Data warehouse initiatives where performance, flexibility and temporalization are primary concerns. They often wonder which approach to use, should it be Anchor Modeling, Data Vault, Dimensional or still Normalized (or NoSQL solutions, which we will not cover here)? These are modeling techniques focus around implementation considerations for Information system development. They are usually packed with an approach to design certain classes of information systems (like Data warehouses) or are being used in very specific OLTP system design. The techniques focus around physical design issues like performance and data model management sometimes together with logical/conceptual design issues like standardization, temporalization and inheritance/subtyping."

"Implementation Data Modeling techniques (also called physical data modeling techniques) come in a variety of forms. Their connection is a desire to pose modeling directives on the implemented data model to overcome several limitations of current SQL DBMSes. While they also might address logical/conceptual considerations, they should not be treated like a conceptual or logical data model. Their concern is implementation. Albeit often abstracted from specific SQL DBMS platforms they nonetheless need to concern themselves with implementation considerations on the main SQL platforms like Oracle and Microsoft SQL Server. These techniques can be thought of as a set of transformations from a more conceptual model (usually envisaged as an ER diagram on a certain 'logical/conceptual' level but see this post for more info on "logical" data models)."

POSTS

Monday, March 5, 2018

Physical Independence Part 1: Don't Mix Model with Implementation

Friday, December 29, 2017

DBMS for Analytics: Risky Business Without Foundation Knowledge, Part 2

Sunday, December 3, 2017

DBMS for Analytics: Risky Business Without Foundation Knowledge (Part 1)

Wednesday, November 8, 2017

Understanding Conceptual vs. Data Modeling Part 1: Data Model - The RDM Is, the E/RM Isn't

Objects, Properties, and Ontological Commitment

Sunday, August 27, 2017

Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness

Tuesday, August 1, 2017

Structure, Integrity, Manipulation: How to Compare Data Models

Sunday, June 11, 2017

What Meaning Means: Business Rules, Predicates, Integrity Constraints and Database Consistency

Monday, January 16, 2017

Don't Design Databases Without Foundation Knowledge and Conceptual Models

Monday, October 24, 2016

The Costly Illusion of Denormalization for Performance

Sunday, June 12, 2016

Levels of Representation: Conceptual Modeling, Logical Design and Physical Implementation

Sunday, January 3, 2016

NoSQL and SQL: A Plague on Both Their Houses

Saturday, December 1, 2012

Data Warehouses and the Logical-Physical Confusion