Wednesday, February 7, 2024

New Paper Published



 New, completely re-written. Available to order here



Table of Contents

Series Preface

Abstract

Introduction

1. Entities, Properties, Names, and Identification

2. Relational Representation

3. Relational Keys

    3.1. Kinds of Keys

           3.1.1. Candidate, Primary and Alternate Keys

           3.1.2. Natural and Surrogate Keys

4. Formal Primary Key Mandate

5. Primary Key Designation

6. Keys and Constraints

7. Keys and Performance

   7.1. Keys and Indexes

Conclusion

References

Appendix A: Keys in SQL

Appendix B: Duplicates and Consequences

B1. Duplicate Interpretation

B2. Duplicates in SQL

      B2.1. Duplicates and Correctness

      B2.2. Duplicates and Query Nestability

      B2.3. Language Redundancy & Duplicates

      B2.4. Duplicates and Performance Optimization

      B2.5. Duplicate Removal

Monday, February 5, 2024

METALOGICAL PROPERTIES Part 2: Assertion Predicate



In Part 1 we introduced in the conceptual model (CM) the metalogical designation property. It represents—in the absence of known shared defining properties of an entity type, the designation by a group's definer that an entity identifier (aka assigned name) or property value is a member of the group. Such a group is not a group of entities, but a group of name and property values. In the logical model (LM), it is formalized as a designation predicate (DP) and defines a domain.

In Part 2, we introduce the metalogical assertion property. It represents the assertion by an authorized database user that a specific entity, represented by a tuple, either does or does not correspond to an actual entity in the real world.

Tuesday, January 9, 2024

METALOGICAL PROPERTIES PART 1: Designation Property



 with David McGovern

One purpose of our contributions here is to suggest a vocabulary that avoids confusion not just within the formal logical level, but also between conceptual and logical terminologies, which is widespread in the industry and is exacerbated by limitations of natural language (NL). We use the following terminology in our approach to conceptual modeling:

  • Objects are:

- Primitive (basic entities);

- Compound:

  - groups of related entities;

  - multigroups (groups of related groups);

  • Properties are:

- Individual (of basic entities);

- Collective:

  - Of groups: relationships among entities within a group;

  - Of multigroups: relationships among groups within a multigroup.

 

Note:  It is a McGoveran insight that relationships between objects at a lower aggregate level are properties of the object at the higher aggregate level which the former comprise (LOGIC FOR SERIOUS DATABASE FOLK, forthcoming; see draft chapters) http://www.alternativetech.com/ATpubs_dir.html For classification of properties as first, second, third and fourth order (1OP, 2OP, 3OP and 4OP) see RELATIONSHIPS AND THE RDM Parts 1-3. https://www.dbdebunk.com/2023/03/relationships-and-rdm-v2-part-1.html All such properties can be expressed logically in a FOPL-based relational data sublanguage as constraints, which is beyond the scope of this discussion.

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST POSTS

09/17 DBDEBUNK REFRESH

08/17 ENTITIES, PROPERTIES AND CODD'S SLEIGHT OF HAND

07/31 ON RELATIONAL KEYS (& DOMAINS) (t&n)

UPDATES

08/13/23 Added Good explanation of 'class' and 'type' to the LINKS  page

LATEST PUBLICATIONS (order from PUBS pages)

First Normal Form: A Definitive Guide

USING THIS SITE

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the SEARCH  page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUTpage. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
  • The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me @DBDdebunk on Twitter

------------------------------------------------------------------------------------------------------------------

 Group Membership

 In a conceptual model (CM), a group is a collection of one or more primitive objects—referred to as entities—drawn from some pre-defined universe thereof, which can be referred to collectively as such by name. The entities are then said to be members of the group, the group being a compound object. By virtue of sharing all the properties required for group membership—referred to as defining properties—entity members are of the same type. These properties define the entity type, instances of which (entities) are members of the group, distinguish them from the entities in the universe that are devoid of these properties and, thus, are neither of the same type, nor group members.

A group is specifiable by:

  • Entity type (i.e., the defining properties applied to the pre-defined universe);
  • Enumeration (exhaustive listing) of members.

While the former specification is preferable, there are circumstances in which it is either not possible or impractical.

Entities in a group are often conceptualized as having optional properties in addition to defining properties. Strictly speaking, this is a gloss: If an entity has one or more optional properties, this indicates that the entity is of a more specific type (a subtype)—namely, one for which those additional properties become defining. In some extreme cases, a very general type of entity may have only optional properties and no required shared properties at all. This situation usually occurs in one of two cases:

1.    We lack knowledge of defining properties, whether through lack of research, analysis, objective agreement, or intentional vagueness. A classic example is written or spoken works judged to be “poems”, where the collection thereof does not share any identifiable properties required of every member. In this case we can divide the group into a number of subgroups (and so entity subtypes), each of which has at least one defining property, and possibly an ill-defined subgroup the members of which share none of the defining properties of other subgroups. This latter subgroup is specifiable only by enumeration. In such situations, a group member must be among those enumerated, and alsohave the known subset of the defining properties for some defined subgroup (possibly including one with no defining properties).

 In some cases, no defining properties can be identified for any subgroup. Then, a member of the group is any entity that is enumerated. This way of specifying is especially useful when we expect or hope to learn the defining properties in due time, eventually eliminating the need for enumerated members.

2.     A collection of entities that have nothing in common—instead of shared properties, the group definer simply designates the members of a group without making any further claims about common properties, in which case the commonality is not intrinsic to entities. Often, the only commonality for group membership involves a selective action on entities by some agent (e.g., a person); for example, someone’s collection of favorite things. The members of such a group need have nothing in common other than being so designated.

Designation Property

It is useful to view the act of designating entities as members of a group as if it creates such a property—a designation property. Although not intrinsic like properties (which describe entities), it is assigned similar to a name (that identifies entities). Names are represented in the database by keys, properties by non-key attributes (see The Key to Relational Keys: Primary Keys). Then we can treat groups resulting from designation as sharing it and names to specify groups by enumeration.

Having expanded our understanding of properties to include designation, we can now define all groups as having members selected from the pre-defined Universe on the basis of the shared defining properties (type)—if nothing else, at least the designation property (name). In essence, we have predefined that portion of the Universe whose members are of the corresponding type. Of course, use of the designation is best avoided whenever possible in favor of definition by type.

Note: Throughout the rest of this discussion, keep in mind that, although we have focused the foregoing on groups of objects, the principles presented apply equally to groups of property values as well. There are many circumstances in which permissible values of a property is best specified by designation and therefore enumerated. The alternative is to specify a procedure by which the values for a property can be observed or measured. Of course, few database designers would be bothered to do this except for scientific or legal purposes.

Designation Predicate

Note: Before I delve into the logical formalization of the conceptual designation property for database representation, a note on terminology.

E.F.Codd (EFC) created RDM by integrating compatible ideas and concepts of set theory, logic, and computing, which use distinct terms for what is essentially the same concept. For example, set in set theory, type in logic, abstract data type (ADT) in computer science (e.g., in programming languages, including any that can express the relational algebra (RA)), and domain in RDM are identical concepts, but the term differs depending on which vocabulary of the three formal languages is used. Further, a particular formal language may limit the operations on a type to a small set. For example, in set theory, the legal operations are the usual set operators. Taking this fact into account, set, type, ADT, and domain are all defined by specifying the collection of permissible values (either extensionally or intensionally) and the operations allowed on those values.

Data practitioners don't mix formal systems, or consider the degree to which they can be used to say the same thing and what is gained from their differences, which is conducive to confusion within the logical level that is only amplified by rampant confusion of the conceptual and logical level exacerbayed by natural language limitations.

 

In RDM, the assigned interpretation of each database relation is a group of entities (represented by tuples) that, by virtue of sharing defining properties, are of the same type and, thus, members of the same conceptual group. Each defining property is represented by a non-key attribute defined on a domain. Thus, each tuple represents facts (a name value and a set of property values)  about a distinct entity (instance of the type). The informal business rules (BR) that describe each entity type (i.e., the defining entity and group properties), jointly formalize in FOPL as a relation predicate (RP) which, when expressed in a relational data sublanguage as constraints, can be enforced by a RDBMS on the relation to ensure consistency of the data with its interpretation; namely, the group it is intended to represent (see What Meaning Means: BusinessRules, Predicates, Integrity Constraints and Database Consistency). 

In each RP—formalized from a group's conceptual definition (BRs)—each defining property is represented by a property predicate

PP(t,p)

where t identifies the tuple and p is a value from the domain of the corresponding attribute that represents the property. The RP must include the conjunction of one PP for each non-key attribute, along with any name predicate for the tuple. A bit informally, the PP asserts that any entity that instantiates the predicate with value p such that it is logically true has the represented property. In ordinary FOPL, a property predicate actually specifies “the subset of the universe of entities that have the represented property P”, and the defining predicate for entities having that one property specifies the intersection of the set of “all entities t of type T in the universe of entities U” with the set of “all entities having that represented property P with value p” (or satisfy the property predicate). Symbolically:

FORALL t: (t U) (U PP(t,p)

 If T is defined as having property P, we can write:

(t T) (FORALL t: (t U) (U PP(t, p1, p2, …, pn)) 

But as note above, the designation property isn't really a(n inherent) property of entities, but an assertion by an authorized user (e.g., modeler or administrator) that the given entity type shares one or more unknown characteristics (not necessarily intrinsic properties) sufficient for the intended use of the group.

More formally, the designation property should be called a metalogical property (alternatively, meta-property, or quasi-property). To express it in a general, yet formal language, would require at least second order logic (which we want to avoid in database management and stick to FOPL). Nonetheless, as long as it is treated as primitive (unanalyzable), we can represent such a metalogical property as if it were just another first order property predicate when defining domains extensionally. Consequently, all domains (and the groups they represent) become expressible formally in a uniform way, whether intensional or extensional. As such, the designation property:

  • Is not explicitly in CMs;
  • Is not explicitly represented as an attribute or constraint in LMs;
  • Is not explicitly included in RP expressions;
  • Should not normally appear in tabular displays of relations with which database users interact (and which they confuse with relations);
  • Is not included in the legacy interpretation of RDM dominant in the industry, nor even in the technical literature.

and it formalizes as a designation predicate (DP) corresponding to a relational domain of entity names on which PK attributes are defined.

Note: Domains may represent entity properties or names (assigned identifiers). While the former consist of a set of property values, the latter consist of a set of entity names; for example, sets of permissible surrogate key values are defined as domains representing names.

This requires capturing extensionally defined (enumerated) domains as an exclusive disjunction of all the members that have been designated as belonging to the group. Thus, a definition of the domain is an exclusive disjunctive predicate of values (since at most one of those values is assigned to an attribute). We can then record the value of an attribute over a domain representing the designation property in the database in exactly the same way we record ordinary properties observed informally and conceptually. This means that domains that represent entity names look very much like domains representing properties, except that the domain values represent non-inherent entity identifiers (e.g., names) rather than values of a property per se. In other words, a group of entities defined by enumeration that share only a designation property as defining is represented formally in the database by a relation defined on a single domain of names defined extensionally.

Note: For certain entity types, a subset of the defining property values (i.e., a natural key) might still suffice to uniquely identify each entity of the type.

Thus, we formalize the act of designation of membership values in such a domain in a way that extensionally defined domains can be expressed formally in the same way as intensionally defined domains via what we call a designation predicate

 DP(n,D)

where n represents a name or property value and D represents a domain that represents either a property or a name, respectively. The DP is added explicitly to the domain predicate (that defines the domain). Whether an attribute of a relation is defined on a domain that is intensionally defined or extensionally defined is now transparent in the RDM.

It is sometimes useful to define a group as the conjunction of a group defined by entity type (defining properties) and another group defined by enumeration. This is done when expanding a group defined by shared properties with one or or more members for which there is insufficient knowledge of their optional properties, the result being an augmented group.  For example, say entities in group A share properties and entities in group B do not. To combine them in one augmented group U

U = A+B  (i.e., the union which is either A or B)

We have an entity supertype-subtypes (ESS) relationship and structurally we have three relations:

RU: defined by the disjunct of RP(A), RP(B) = RP(A) OR RP(B);

RA: the subset of U with shared properties—a relation with a key (name) and non-key attributes (shared properties);

RB: the subset of U with shared properties—a relation with only a key (name).

where RU represents the supertype and RA and RB the subtypes.

In Part 2, we will examine another meta-logical property, one that plays a crucial role in the state of database relations and allows us to represent the changing state of a database without having to invent semantics for “time-varying” relations per se. 

 (Continued in Part 2)

 

 

 

 

 

 

 

Wednesday, November 22, 2023

HAPPY HOLIDAYS!



Due to: 

 

1. Taking care of some health issues that have accumulated (not getting any younger);

2. Concentration on the Israel-Hamas War;

3. Effort to update old papers and write new ones;

4. Much needed rest and the holidays.

 

I am taking the remaining of the year off and will re-start my contributions in January.

 

Wishing you and yours season's greetings and happy holidays!







Friday, October 13, 2023

EVERYBODY THINK THEY KNOW FIRST NORMAL FORM, BUT NOBODY DOES



“I have read this article in an effort to boost my academic knowledge on data modeling a bit and still have no idea what this academic author wanted to say. Apparently First Normal Form (1NF) doesn't get enough respect and then proceeds to talk about Non-First Normal Form (NFNF). But what about First Normal Form (1NF) damnit.”

By sheer chance this was posted on LinkedIn just after I published my new paper The First Normal Form: A Definitive Guide.

PRACTICAL DATABASE FOUNDATIONS

FIRST NORMAL FORM

A DEFINITIVE GUIDE

(September 2023)

Fabian Pascal

 

Table of Contents

 Introduction

1.      The Normal Form

2.      The First Normal Form

3.      Domain Decomposability & Atomicity

4.      1NF & Tables

5.      SQL & 1NF

5.1.     Repeating Groups & Repeated Attributes

5.2.   Information Principle & SQL
 

Sunday, September 17, 2023

DBDEBUNK REFRESH



After 23+ years DBDebunk.com has undergone a slight refresh. Here is an outline of the changes.

1. The BOOKS and PAPERS are now unified into one PUBS page from which my papers and self-published books can be ordered. There are two paper series:

  • Understanding the Real RDM: reviews and analysis of some Codd's published papers 1969-85 in the spirit of McGoveran re-interpretation of his work--distinct from that dominant in the industry.
  • Practical Database Foundations: shorter papers about narrower aspects of the RDM that are abused, misused and ignored in the industry.

Some papers will be brought up to date/re-written and new ones will be published.

2. The recommended books links from the BOOKS page have been moved to the LINKS page.

3. The FUNDAMENTALS page was replaced by a SEARCH page that contains a newly compiled data and relational fundamentals comprehensive dictionary of terms to be maintained on a regular basis. It includes abbreviations/acronyms to be used as labels for searching. Use the page as a reference in conjunction with the labels to compensate partially for a limitation of the Blogger labels facility.

4. Labels now exist only for the search terms that have known abbreviations  or acronyms. The SEARCH page should be used to identify which terms have labels and those that do not must be used with the Blogger search facility (see details on ABOUT page).


Any problems, comments, suggestions, questions, or support are welcome--drop me an email.


 

 

 

 

Thursday, August 17, 2023

ENTITIES, PROPERTIES AND CODD'S SLEIGHT OF HAND



Note: This a revision of an earlier post

RDM is an application to database management of mathematical relation theory (MRT) consistent with simple set theory (SST) expressible in first order predicate logic (SST/FOPL) that is used to formalize symbolically conceptual models of reality as logical models for database representation.

In RDM a domain can "appear" in multiple relations: the domain represents an abstract property, attributes defined on it represent that property in contexts of specific entity groups that relations represent. For example, attribute SALARY in relation EMPLOYEES represents the property represented by domain MONEY in the context of entities of type Employee and attribute BUDGET in relation DEPARTMENTS represents it in the context of entities of type Department.

View My Stats