Saturday, September 29, 2018

Conceptual Modeling, Ontological Commitment, and the Data Model

Revised: 10/1/18

We are culturally and linguistically conditioned to view the world as consisting of objects with properties[1]. Objects in a universe thereof that share common properties are of the same type and form a class, distinguishing them from the objects that are not and do not. Applying a class definition to the universe (i.e., selecting out the objects with the common properties) produces a group of objects of that type.

Ontology is the philosophical study of concepts that directly relate to being, existence, reality, as well as the basic categories of being, and their relationships -- questions concerning what entities exist or may be said to exist and how such entities may be grouped, related, and subdivided according to similarities and differences[2].

Note: We use 'object' in the general, not OO sense. Philosophical ontology should not be confused with "computer science ontology", whereby the term ontology was usurped, and was used by programmers to mean a conceptual graph of directed relationships among objects (and only sometimes among object types).

Conceptual modeling (1) identifies types of real world objects and (2) formulates business rules that specify their properties and relationships[3] and, thus, makes an ontological commitment that the data model used to formalize conceptual models as logical models for computable database representation must be consistent with. Unfortunately, due to lack of foundation knowledge in the industry[4], practitioners are largely unaware of, and oblivious to the ontological underpinning of modeling and their implications for database practice, one reason why it has not progressed for the last five decades.

Tuesday, September 18, 2018

Don't Conflate/Confuse Primary Keys, PK Constraints, and Indexes

“What is the difference between an index and a key? How are they related?”

“There seams to be some confusion between what a Primary Key is, and what an Index is and how they are used. The Primary Key is a logical object. By that I mean that is simply defines a set of properties on one column or a set of columns to require that the columns which make up the primary key are unique and that none of them are null. Because they are unique and not null, these values (or value if your primary key is a single column) can then be used to identify a single row in the table every time. In most if not all database platforms the Primary Key will have an index created on it. An index on the other hand doesn’t define uniqueness. An index is used to more quickly find rows in the table based on the values which are part of the index. When you create an index within the database, you are creating a physical object which is being saved to disk.”

“A primary key by default creates a clustered index. A unique constraint/key by default creates a non-clustered index.”

“An index is a (logically) ordered list of rows. For example, an index on LastName means all values are already sorted in LastName order. Usually index rows contain far fewer columns in them than the table itself (except the clustered index, which is the table). A key is a column or columns that defines the order of an index. For example, on an index ordered by (LastName,FirstName), then LastName and FirstName are the keys. Btw, a primary key is a physical object, not a logical one. The db engine needs physical rows in order to insure unique values in the index.”
--Difference between an index and a key?,
I have recently published a paper[1], and posted a multipart series[2] on relational keys. In the latter I stated as follows:
"As a relational feature, keys can only be properly understood within the formal foundation of the RDM, which is simple set theory (SST) expressible in first order predicate logic (FOPL) adapted and applied to database management. Yet that is precisely what is ignored and dismissed in the industry -- including by the authors of SQL[3]."
I have also written extensively on widespread logical-physical confusion (LPC)[4], recently specifically in the key-index context[5]. The replies above are examples -- if any more were needed -- that validate my repeated claim of lack of foundation knowledge in the industry -- can you tell what's wrong with, and what's correct in, them?

Tuesday, September 11, 2018

Designation Property and Assertion Predicate

“A set is an identification of zero or more objects (depending on context, the terms “elements” or even “entities” may be used) that can be referred to as a group by name (usually a symbol), and which are drawn from some pre-defined universe of objects. Such objects are then said to be the members of the set. The members of a set have one or more properties in common ... One property that deserves special attention is that of designation. The sole defining property of a set can be simply that the definer of that set has explicitly designated certain (one or more) objects as all the set’s members. Each such member then has the property of having been designated as a member of a specific (e.g., named) set. We will refer to such a property as a designating property ... To make the point more explicit, for any set, an assertion of set membership has definitional priority over the necessity of any other properties being shared among the members.”[1]

Thus, set members are "drawn from some pre-defined universe of objects" on the basis of sharing common properties -- if nothing else, at least a designation property (DP) -- the defining (required) properties that distinguish objects that are set members from non-members.

In database management, base (as distinct from derived) relations are sets defined such that their members -- tuples -- represent (facts about) groups of entities that share both (1) defining (required) properties and (2) a DP[1]. Because the DP i
s implicit in the semantics of update operations under the Closed World Assumption (CWA)[3] and:

  • Is not included in conceptual models;
  • Is not represented as an attribute or constraint in logical models; 
  • Does not appear in R-table displays of relations with which practitioners interact (and confuse with relations[2])

and the conventional interpretation of the RDM has been silent on it, practitioners are completely unaware of it.