Sunday, February 10, 2019

Understanding Domains and Attributes




Note: This is a rewrite of one section of an older post (page thereof now links here), to bring it into line with McGoveran's formalization, re-interpretation, and extension of Codd's RDM[1]. The rewrite of the other part will be posted next.
“I don't understand the concepts of domain and attribute in relational database modeling. Can someone give me an effective example?”

“Domain is an overloaded word in the DB lexicon. It probably should also be avoided. When one refers to an attribute domain in practice it is only referring to columns that have a check constraint on them that limit the values. Reference tables with foreign key constraints in general also fulfill the spirit of what domain attributes do outside of an RDBMS.”

“A domain in most SQL usage is essentially an alias name for an existing type + restrictions on an existing type that can be used in a column. As for an attribute, it's essentially a COLUMN in SQL, a field in other types of databases, etc.”
To the extent that practitioners are familiar with domains, they equate them with programming data types (PDT), or, at best, with SQL data types.

Test your foundation knowledge -- are domains the same as PDTs or SQL data types?


------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

Up to 2018, DBDebunk maintained and kept free with the proceeds from my @AllAnalitics column. In 2018 the website was has been discontinued. The content is not available anywhere else, and if you deem it useful, particularly if you are a regular reader, please ensure its continuation and free availability by supporting it with as much as you can afford via purchases of publications, or donations. Thank you.

LATEST PUBLICATIONS 




· Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.

· Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING OF THE REAL RDM series, is available for ordering here.

SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture which I brink back on LinkedIn.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West and the corresponding breakdown of the world order.
HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------

Abstract Data Types


An abstract data type (ADT) is a formal concept defined as:
  • A set of values consistent with an intensional definition -- the criterion for membership of values in the set;
  • Well-defined operators on those values:
- effective algorithms for the operators; and,
- defined output ADTs for results.

Only the defined operators can be applied to ADT values.

Note: A set of objects can be defined extensionally (by enumerating its members), or intensionally (by the properties objects are required to share to qualify for membership).

Properties and Relational Domains


A relation is a set of tuples that represents formally in the database (facts about) entities that are members of a group by virtue of sharing required properties -- the defining properties of the group -- which jointly serve as part of the membership criterion that members must satisfy. A group can be specified by enumeration of its members, or by specifying its membership criterion -- its defining properties.

Note: Collective group properties, arising from relationships among all the members are also group defining properties.

A relational domain is an application of the ADT to database management. The DBMS implements abstract ADTs, which in each specific database have an interpretation: they represent formally real world properties specified by the business rules (BR) in the conceptual model.

A property is:

  • Atomic, if only its existence/non-existence can be observed (entities either have it or not), in which case it does not have values per se, but rather dichotomous 'truth values' (e.g., 'true'/'false');
  • Non-atomic, if it is both observed and measured, in which case it is multivalued, and can be either:
- qualitative, if values have no intrinsic order (e.g., taste {sweet,sour,bitter,salty}); or,
- quantitative, if values are ordered or numeric (e.g., degree of saltiness, or income)[1].

Inherently atomic properties (i.e., that are atomic because of some logical or intrinsic aspect of the subject) are necessarily also primitive. Non-atomic properties can be either primitive, or derived from other properties. Property atomicity is not absolute -- it depends on (1) the understanding of reality as expressed in the conceptual model, which may deepen or change over time, and (2) the level of detail that needs to be formally represented, which may expand or contract over time. For example:

  • Is_painted (yes/no) is atomic;
  • Paint_color is non-atomic and either:
- primitive {blue,green,red,yellow,black,white} either (1) at a less sopisticated level of understanding of color, or (2) by choice, if it's sufficient for database representation [of the subject matter] (2); or,
- derived ([e.g.,] a combination of specific wavelength and absorption coefficient or reflectivity) at a [more sophisticated] level of understanding of color[1].

Note: The RDM relies implicitly on McGoveran's Information Representation Conjecture, that implicitly seems to underlie almost all digital communication and information theory:


“Any non-atomic property can be analyzed into a set of atomic properties, each of which can be encoded via a single bit of information.”
otherwise it would not be a useful theory for computing/databases. For example, qualitative property taste {sweet,sour,bitter,salty} can be expressed as a combination of atomic properties sweetness, sourness, bitterness, and saltiness. If taste were quantitative (e.g., degree of sweetness...), the expression would be more complex[1].

Atomic properties are represented by primitive domains, non-atomic properties by either primitive, or derived domains. Properties are atomic when there is no intensional definition. Non-atomic domains are defined intensionally:
“Extensional [definitions limit the] ability to capture and distinguish meanings, thereby limiting interpretational power disadvantageously. We believe it is self-evident that it is necessary to distinguish between properties that have the same extension, but a different sense. For example, if we limit a domain to 'integers except 9', the extension of both (a) the property 'the smallest even prime', and (b) the property 'the cube root of some n' is exactly 2, but these properties have a different sense and that difference must not be lost.”
--David McGoveran

Domains and Programming Data Types


The theory behind PDTs is based on ADTs, but they are hardly ever used as such in practice, and programming languages are rarely strong in this regard  PDTs are usually fixed, standard based, and reflect physical machine limitations of the day, which is hardly abstract (OOP does try, but starts with physical data types as primitive, and does not always enforce ADT principles when a data type is defined).
 
Among the core motivations for the RDM were data independence and decidability, which require separation of a FOPL-based declarative data sublanguage from the procedurality and higher logic of computationally complete programming languages (CCL)[2,3]. To that end relations have only simple domains, the values of which are treated as atomic by the relational sublanguage (i.e., they have no components meaningful to users/applications)[4]: the sublanguage does not allow application programs access to tuples and attributes of the values of relation-valued domains (RVD), if any[5].

Note: Atomic domains that represent atomic properties should not be confused with domains with atomic values with respect to the data sublanguage.

“What Codd meant by "atomic values" is that relational algebra (RA) does not manipulate internal components of domain values, and has no knowledge of the algorithms used in domain operators. RA only sees the inputs and outputs of domain operators, and those are always attribute values, tuples, or relations -- the operators are fully encapsulated, which is why the language for implementing domain operators is not constrained to RA -- it is safe to use the full power of a CCL.”

--David McGoveran 

Codd introduced domains (which he called "extended types"[6]) in part to distinguish them from PDTs as used in industry practice. A relational domain:
  • Represents a real world property;
  • Is abstract (logical);
  • Has atomic values;
  • Is a database designer defined, DBMS-constrained value set.
while PDTs do not necessarily represent properties, are not necessarily simple, are tied to physical implementation, and are programmer defined and application constrained.

Note: A FOPL-based relational sublanguage should be properly integrated in a CCL. Details are beyond the scope of this discussion, but suffice it to say here that properly means that such integration should not violate FOPL.

Properties in Context and Attributes


Codd started only with domains, and subsequently used terms such as "roles" and "active domains", before he settled on attributes. We now contend that an attribute is a representation of a domain in the context of the group represented by the relation[1].

A property in the conceptual model can be associated with entities of more than one type. When associated with entities of a type, it is a property in context (PiC)[1,7] -- the corresponding entity group being the context -- which we refer to as a first order property (1OP). 1OPiCs (first order PiCs) are represented by attributes at the logical level. For example, the property 'amount', represented formally by domain AMOUNT, can be associated with entities of type 'order' as an 'order amount' PiC, and with entities of type 'customer' as a 'customer balance' PiC, represented by attributes ORDER_AMT and CUST_BAL, respectively. Both attributes are representations of AMOUNT in the ORDERS and CUSTOMERS relations.

At a logical level an attribute is, formally, a simple function (1:1 mapping) of a domain[1], and is either the same set of values as the corresponding domain, or a subset thereof.

Note: Another context is time. Database relations are time-varying[8] (i.e., each represents facts about  membership of a group that varies over time via updates[9], but are always evaluated as a specific point in time, and so are well-defined sets (discussed in next post). We alert the reader that the temporal aspect is poorly understood and treated in the relational context, which has the potential to destroy both the SST and FOPL foundations of the RDM (see [1] for details).

Relations must be semantically consistent (i.e., represent faithfully the BRs in the conceptual model), which include property and PiC rules[7], and formalize as domain and attribute constraints, and must be declared by the database designer and enforced by the DBMS, just like all other BRs and constraints, respectively.

SQL Data Types


SQL databases and DBMSs are not truly relational[10,11,12,13] and, thus:

  • SQL data types are not genuine domains: they are akin to PDTs, rather than designer-constrainable ADTs, there is no support for deriving domains from primitive or other derived domains, and so on;
  • SQL columns are not genuine attributes (i.e., are not (functional) representations of identified domains, are ordered, and so on.


Conclusion


You are now in a position to assess the quotes above.

Data professionals should isolate themselves from ignorance, not from knowledge.



Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.


References

[1] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLKS, forthcoming (draft chapters).

[2] Pascal, F., Natural, Programming, and Data Languages.

[3] Pascal, F., Data Sublanguages, Programming, and Data Integrity.

[4] Pascal, F., Simple Domains and Value Atomicity.

[5] Pascal, F., First Normal Form in Theory and Practice Parts 1-3.

[6] Codd, E.F., THE RELATIONAL MODEL FOR DATABASE MANAGEMENT: VERSION 2.

[7] Pascal, F., Data and Meaning Part 2: Types of Business Rules.

[8] Codd, E.F., A Relational Model of Data for Large Shared Data Banks.

[9] Pascal, F., Designation Property and Assertion Predicate.

[10] Pascal, F., SQL Sins.

[11] Pascal, F., The Interpretation and Representation of Database Relations.

[12] Pascal, F., What Relations Really Are and Why They Are Important.

[13] Pascal, F., To Really Understand Integrity, Don't Start with SQL.



No comments:

Post a Comment

View My Stats