Saturday, September 29, 2018

Conceptual Modeling, Ontological Commitment, and the Data Model

Revised: 10/1/18

We are culturally and linguistically conditioned to view the world as consisting of objects with properties[1]. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from the objects that are not and do not. Applying a class definition to the universe (i.e., selecting out the objects with the common properties) produces a group of objects of that type.

Ontology is the philosophical study of concepts that directly relate to being, existence, reality, as well as the basic categories of being, and their relationships -- questions concerning what entities exist or may be said to exist and how such entities may be grouped, related, and subdivided according to similarities and differences[2].

Note: We use 'object' in the general, not OO sense. Philosophical ontology should not be confused with "computer science ontology", whereby the term ontology was usurped, and was used by programmers to mean a conceptual graph of directed relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of real world objects and (2) formulates business rules that specify their properties and relationships[3] and, thus, makes an ontological commitment that the data model used to formalize conceptual models as logical models for computable database representation must be consistent with. Unfortunately, due to lack of foundation knowledge in the industry[4], practitioners are largely unaware of, and oblivious to the ontological underpinning of modeling and their implications for database practice, one reason why it has not progressed for the last five decades.

Tuesday, September 18, 2018

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes

“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”
--Difference between an index and a key?,
I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:
"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."
I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?

Tuesday, September 11, 2018

Designation Property and Assertion Predicate

“A set is an identification of zero or more objects (depending on context, the terms “elements” or even “entities” may be used) that can be referred to as a group by name (usually a symbol), and which are drawn from some pre-defined universe of objects. Such objects are then said to be the members of the set. The members of a set have one or more properties in common ... One property that deserves special attention is that of designation. The sole defining property of a set can be simply that the definer of that set has explicitly designated certain (one or more) objects as all the set’s members. Each such member then has the property of having been designated as a member of a specific (e.g., named) set. We will refer to such a property as a designating property ... To make the point more explicit, for any set, an assertion of set membership has definitional priority over the necessity of any other properties being shared among the members.”[1]

Thus, set members are "drawn from some pre-defined universe of objects" on the basis of sharing common properties -- if nothing else, at least a designation property (DP) -- the defining (required) properties that distinguish objects that are set members from non-members.

In database management, base (as distinct from derived) relations are sets defined such that their members -- tuples -- represent (facts about) groups of entities that share both (1) defining (required) properties and (2) a DP[1]. Because the DP i
s implicit in the semantics of update operations under the Closed World Assumption (CWA)[3] and:

  • Is not included in conceptual models;
  • Is not represented as an attribute or constraint in logical models; 
  • Does not appear in R-table displays of relations with which practitioners interact (and confuse with relations[2])

and the conventional interpretation of the RDM has been silent on it, practitioners are completely unaware of it.

Wednesday, August 29, 2018

DISTINCT and ORDER BY Are Not Relational

“One of the things that confuse SQL users all the time is how DISTINCT and ORDER BY are related in a SQL query ... most people quickly understand:
FROM film

[that] returns results in an arbitrary order, because the database can (and might apply hashing rather than ordering to remove duplicates) ... Most people also understand:
SELECT length
FROM film
ORDER BY length

[that] will give us duplicates, but in order ... And, of course, we can combine the two:
FROM film
ORDER BY length

[But if] somewhat intuitively, we may want to order the lengths differently, e.g. by title:
FROM film
ORDER BY title

[m]ost databases [sic] fail this query with an exception like Oracle’s:

ORA-01791: not a SELECTed expression

At first sight ... this

SELECT length
FROM film
ORDER BY title

works after all ... So, how are these different? We have to rewind and check out the logical order of SQL operations (as opposed to the syntactic order). And always remember, this is the logical order, not the actual order executed by the optimiser.”
--How SQL DISTINCT and ORDER BY are Related,

Wednesday, August 15, 2018

Order Is For Society, Not Databases

8/18/18: I have re-written this post for a better explanation. If you read it prior to the revision, you should re-read it.
“I learned that there is no concept of order in terms of tuples (e.g. rows) in a table, but according to wikipedia "a tuple is an ordered list of elements". Does that mean that attributes do have an order? If yes why would they be treated differently, couldn't one add another column to a table (which is why the tuples don't have order)? [OTOH], "In this notation, attribute–value pairs may appear in any order." Does this mean attributes have no order?”
--Do the “columns” in a table in a RMDB have order?
“Is it possible to reorder rows in SQL database? For example, how can I swap the order of 2nd row and 3rd row's values? The order of the row is important to me since i need to display the value according to the order [and] 'Order by' won't work for me. For example, I put a list of bookmarks in database. I want to display based on the result I get from query. (not in alphabet order). Just when they are inserted. But user may re-arrange the position of the bookmark (in any way he/she wants). So I can't use 'order by'. An example is how the bookmark display in the bookmark in firefox. User can switch position easily. How can I mention that in DB?”
--How can I reorder rows in sql database

While some data professionals may know that rows and columns of "database tables" are "unordered", few of them know what that means, and understand why. This is due to two, not unrelated, of the many common misconceptions[1] rooted in the lack of foundation knowledge in the industry, namely that relational databases consist of tables[2], and logical-physical confusion (LPC)[3]. They obscure understanding of the RDM and its practical implications, which is reflected in the answers to the above questions. Instead of debunking them, this post fills the gap in knowledge such that you can debunk them yourself -- try it before and after you read it.

Sunday, August 5, 2018

No Such Thing As "Primary Key Tuning"

“The choice of good InnoDB primary keys is a critical performance tuning decision. This post will guide you through the steps of choosing the best primary key depending on your workload ... You would be surprised how many times I had to explain the importance of primary keys and how many debates I had around the topic as often people have preconceived ideas that translate into doing things a certain way without further thinking.”
--Yves Trudeau, Principal architect,

I will be labeled "pedantic" and a "purist" for saying this, but there is no such thing as "PK tuning". "Choice of good PK for performance" reflects logical-physical confusion (LPC), which, in turn, is rooted in lack of foundation knowledge, and failure to grasp the RDM -- not just by database practitioners, but even by DBMS designers.

Monday, July 30, 2018

Lenin, Trotsky, Data Management, and the Tyranny of Knowledge and Reason -- Version 2

14 years ago I published an editorial with this title in response to a exchange triggered by one of my articles at The objective was to illustrate some of the damage done by the collapse of education to the IT industry, reflecting a trend in the society  as a whole. Things have only gotten worse since then, and for quite a while I considered a re-write. A recent Linkedin exchange provided the impetus, so here it is.