Saturday, February 16, 2019

Class, Type, Set, Relvar, and Relation

Note: This is a rewrite of a part of an older post (now redirecting here), to bring into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1] (the rewrite of the other part was posted last week).
“[According to Date] relvar = class. [But i]n simple terms, class applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a relation is a class (and trivially also a set), which contributes to confusion.”

“In modern programming parlance, class is generally distinguished from type only in that the latter refers to primitive (system-defined) data definitions, while class refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, type and class are actually synonymous.”
Class, type, and set are often used interchangeably in the industry. Relations are neither class nor type, and Date's relvars must be placed properly in their formal context. While details regarding these concepts vary with the flavor of set theory, they are sufficiently well defined to be distinguishable in each of the three formal foundations of the RDM, simple set theory (SST), mathematical relation theory, and first order predicate logic (FOPL).

Sunday, February 10, 2019

Understanding Domains and Attributes

Note: This is a rewrite of one section of an older post (page thereof now links here), to bring it into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1]. The rewrite of the other part will be posted next.
“I don't understand the concepts of domain and attribute in relational database modeling. Can someone give me an effective example?”

“Domain is an overloaded word in the DB lexicon. It probably should also be avoided. When one refers to an attribute domain in practice it is only referring to columns that have a check constraint on them that limit the values. Reference tables with foreign key constraints in general also fulfill the spirit of what domain attributes do outside of an RDBMS.”

“A domain in most SQL usage is essentially an alias name for an existing type + restrictions on an existing type that can be used in a column. As for an attribute, it's essentially a COLUMN in SQL, a field in other types of databases, etc.”
To the extent that practitioners are familiar with domains, they equate them with programming data types (PDT), or, at best, with SQL data types.

Test your foundation knowledge -- are domains the same as PDTs or SQL data types?

Saturday, January 19, 2019

Data and Meaning Part 4: Query and Result Correctness

As we have seen in Parts 1, 2, and 3, the RDM is a formal theory adapted and applied to database management: database relations (1) preserve the formal properties of mathematical relations, but also (2) have interpretations -- carry a real world meaning assigned by a conceptual model: facts about entities, entity groups, and multigroups (i.e., their properties, some of which are relationships, specified by business rules (BR)). A relation is formally in 5NF and constrained for semantic consistency (i.e., to represent facts about an entity group).
“When we create specific domains, relations, and attributes we are constraining (restricting) an abstract logical system to a specific interpretation (meaning). Seen the other way around, an interpretation of the logical system is a representation of a specific segment of the world, and that is exactly the purpose of database design. For example, an attribute name created by the designer is assigned meaning intended by the modeler as representing an entity property, which is the very meaning of semantics. That is why full normalization cannot be achieved or assessed without reference to some conceptual model -- what attribute names mean, and how they are related to each other (i.e., their dependencies), and so on.” --David McGoveran
Yet requesting and giving design advice without a conceptual model is routine in the industry[1]. What is more, most practitioners are oblivious to the implications for correctness of queries and results[2].

Wednesday, January 9, 2019

Data and Meaning Part 3: Database Design

We have seen in Part 2 that the meaning of data in a database is the conceptual model that the database is intended to represent, namely (1) the three types of objects -- entities of multiple types that form entity groups that form a multigroup -- and (2) the business rules (BR) that specify their properties:
  • Properties in context (PiC) shared by entities of each type;
  • Collective group properties (i.e., relationships among entity group members);
  • Multigroup properties (i.e., inter-group relationships).
Often somebody produces one or more tables and asks if there's "anything wrong" with them,  or "if they are in some specific normal form and, if not, how to normalize them". This reflects lack of foundation knowledge. 

Tuesday, January 1, 2019

Data and Meaning Part 2: Types of Business Rules

Per Part 1, meaning is captured during conceptual modeling as information about objects of interest, specifically their properties (some of which are relationships), specified in business rules (BR). Because they are expressed informally in natural language, objects and BRs must be formalized into computable form. Data modeling (we prefer logical database design) uses a formal data model to formalize informal conceptual models as formal logical models for database representation: it assigns the meaning in the former to symbols and expressions in the latter[2]. Using the RDM:

  • Objects -- entities, entity groups, and multigroups -- formalize as tuples, relations, and databases, respectively;
  • Properties formalize as domains, and when associated with entities of specific types, as attributes;
  • Group and multigroup properties -- relationships among entities, and among groups[3] -- formalize as constraints on and among relations enforceable by the DBMS.

Sunday, December 2, 2018

What Is a Data Model, and What It Is Not

“The term data model is used in two distinct but closely related senses. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain, for example the customers, products, and orders found in a manufacturing organization. At other times it refers to a set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses.”
--Data Model, Wikipedia

What a True Data Model Is

Few practitioners realize that Codd invented the Relational Data Model (RDM) as the first exemplar of a data model, a concept that he formalized in 1980 as follows: