Wednesday, August 15, 2018

Order Is For Society, Not Databases




8/18/18: I have re-written this post for a better explanation. If you read it prior to the revision, you should re-read it.
 
“I learned that there is no concept of order in terms of tuples (e.g. rows) in a table, but according to wikipedia "a tuple is an ordered list of elements". Does that mean that attributes do have an order? If yes why would they be treated differently, couldn't one add another column to a table (which is why the tuples don't have order)? [OTOH], "In this notation, attribute–value pairs may appear in any order." Does this mean attributes have no order?”
--Do the “columns” in a table in a RMDB have order?
“Is it possible to reorder rows in SQL database? For example, how can I swap the order of 2nd row and 3rd row's values? The order of the row is important to me since i need to display the value according to the order [and] 'Order by' won't work for me. For example, I put a list of bookmarks in database. I want to display based on the result I get from query. (not in alphabet order). Just when they are inserted. But user may re-arrange the position of the bookmark (in any way he/she wants). So I can't use 'order by'. An example is how the bookmark display in the bookmark in firefox. User can switch position easily. How can I mention that in DB?”
--How can I reorder rows in sql database

While some data professionals may know that rows and columns of "database tables" are "unordered", few of them know what that means, and understand why. This is due to two, not unrelated, of the many common misconceptions[1] rooted in the lack of foundation knowledge in the industry, namely that relational databases consist of tables[2], and logical-physical confusion (LPC)[3]. They obscure understanding of the RDM and its practical implications, which is reflected in the answers to the above questions. Instead of debunking them, this post fills the gap in knowledge such that you can debunk them yourself -- try it before and after you read it.




------------------------------------------------------------------------------------------------------------------
SUPPORT THIS SITE 

I have been using the proceeds from my monthly blog @AllAnalytics to maintain DBDebunk and keep it free. Unfortunately, AllAnalytics has been discontinued. I appeal to my readers, particularly regular ones: If you deem this site worthy of continuing, please support its upkeep. A regular monthly contribution will ensure this unique material unavailable anywhere else will continue to be free. A generous reader has offered to match all contributions, so let's take advantage of his generosity. Purchasing my papers and books will also help. Thank you. 

NEW PUBLICATIONS 

NEW: The Key to Relational Keys: A New Perspective

NEW: SOCIAL MEDIA 

I deleted my Facebook account. You can follow me on Twitter:

  • @dbdebunk: will contain links to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture, which I am bringing back.
  • @ThePostWest: will contain evidence for, and my take on the spike in Anti-semitism that usually accompanies existential crises. The current one is due to the decadent decline of the West, and the corresponding breakdown of the world order.

HOUSEKEEPING

  • To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the FUNDAMENTALS page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 are relabeled. As other older posts are rewritten, they will also be relabeled, but in the meantime, use Blogger search for them. 
  • Following the discontinuation of AllAnalytics, the links to my columns there no longer work. I moved the 2017 columns to dbdebunk and, time permitting, may gradually move all of them. Within the columns, only the links to sources external to AllAnalytics work. 
------------------------------------------------------------------------------------------------------------------


Mathematical vs. Database Relations


In simple set theory (SST):
“... nothing in the definition of a set demands that the members of a set be orderable, [and] nothing in the definition forbids it either. Contrary to some authors, it is not wrong to speak of an “ordered set” (a set whose members have some order) [but t]he order or lack thereof of a set’s members is completely orthogonal to the concept and definition of a set ... When representing a set, we identify each member of the set with a unique symbol ... the members of a set may not be distinguished from each other merely by their order of appearance in any representation of the set: thus, the set {0,1,3,1,2} is identical to the set {0,1,3,2} and to the set {0,1,2,3}. Note that the inability to use order in the representation of a set as a distinguishing property does not mean that there cannot be an ordering among the members of a set.”[4]
In other words, the order of set members is insignificant -- it is not part of the set definition.

In mathematical relation theory (MTR), a relation is a special kind of set: a set of n-valued tuples (its members) that is a subset of the Cartesian product of n domains. Tuples are sets of n values, one value from each domain and, as set members, their order is insignificant. Domains are sets of values labeled by symbols indexed with unique numeric subscripts, distinguishable by their position in order of the numeric subscripts (first, second, and so on).

Relations are mathematical abstractions devoid of real world meaning. The RDM adapts the MTR, and the SST expressible in first order predicate logic (FOPL) for applicability to database management. It makes adjustments to mathematical relations and the set operations that manipulate them, such that, on the one hand (1) they have a real world interpretation (i.e., carry meaning):
  • Database relations represent entity groups;
  • Domains represent entity properties;
  • Attributes represent properties in context (i.e., are representations of domains in specific contexts)[];
  • Tuples represent facts about entities;
  • Constraints represent relationships among properties and among entities[5].

while, on the other hand (2) their formal theoretical properties (in MTR, SST and FOPL) are preserved. It's these properties that confer the crucial advantages unique to relational database management, among them:
  • DBMS-guaranteed logical validity and database design that ensures semantic correctness[6];
  • Decidable declarative data sub-language[7];
  • Physical and logical independence[8,9];
  • Simplicity[10]

and so on. One of the formal properties to be preserved is the insignificance of order of tuples and attributes -- RDM's set operations of the relational algebra (RA) expressible in FOPL are independent of such order.

Note: The subscripted labels of domains are essentially "numeric names" that distinguish domains from one another. One of the RDM adjustments is:
“In many commercial, governmental, and scientific data banks, however, some of the relations are of quite high degree (a degree of 30 is not at all uncommon). Users should not normally be burdened with remembering the domain ordering of any relation ... To accomplish this, domains must be uniquely identifiable at least within any given relation, without using position ... so that users could deal with the [relation] ... and its domains ... without regard to any ordering between these domains.”[11]
The addition of attributes and primary keys (PK)[12] are adjustments that also enable order independence in relational databases.


Order Independence


Data practitioners believe relational databases consist of tables -- belief for which SQL has responsibility. Tables are not relations, but tabular presentations thereof on some physical medium (paper, screen) -- referred to as R-tables -- that do not have the formal theoretical properties of relations. In particular, R-tables do have a left-right column and top-bottom arrangement on the medium that induces the impression of a significant order -- let's call it display order.

Other than carrying meaning, database relations differ from abstract mathematical relations in that the data of some of them are physically stored. The data of a relation can be stored "as a table" (i.e., a file with one physical record per tuple -- referred to as direct image implementation), but there are many ways to store data (e.g., "columnar databases" may store one physical record per attribute, rather than tuple), and each way has a physical order that we shall call storage order.

As explained, for relational advantages to materialize in database practice, the formal theoretical properties of the RDM, including order insignificance, must be preserved. Display and storage order are both physical, but distinct, even though they are often confused in the industry -- one motivation for the RDM[11].

The row-column arrangement on the display medium is a property of the tabular presentation of relations, and presentation is an application, not DBMS function, outside the purview of the RDM[13].

Base relations are sets of tuples that represent facts about entities chosen by users to be represented in the database (axioms in logic). Users and applications query the database by applying RA operations to relations to make inferences (logic deductions) from those facts -- derived relations that are sets of tuples that represent facts that are logical implications of the base facts (theorems in logic). As we have stressed more than once, if and only if (1) the DBMS implements the RDM[14] and (2) users adhere relational database design principles[15] (i.e., both the DBMS and databases are relational), storage order is insignificant, namely, data manipulation is order independent -- a relational data sub-language does not reference it.

Unfortunately, no available DBMS is relational, and neither are most databases[16]. Even a true RDBMS (if any existed) could not ensure that database designers will not encode information in the order of tuples or attributes, rendering it significant. For example, if the age of tuples is meaningful, but the design leaves it implicit in the storage order of tuples, it is inaccessible to RA operations expressible in FOPL, and:

  • Accessing it requires non-relational manipulation and a logic higher than FOPL, with loss of the RDM advantages; and,
  • If the storage order changes -- as may be necessary for performance optimization -- the information is lost, and queries and applications may no longer work, requiring maintenance.
Otherwise put, all bets are off. It is for this reason that the core RDM Information Principle (IP) mandates user design discipline: "All information in a relational database is represented explicitly and in exactly one way: as values of relation attributes"[17]. It prohibits encoding information implicitly in the order of tuples and attributes, or in any way other than explicitly, as attribute values drawn from domains.

A Note on Order in SQL


While SQL DBMSs have been sold as RDBMSs, they are not relational for a variety of reasons[18]. With respect to order:
  • ORDER BY is included in the data sub-language, obscuring that display order is the purview of applications;
  • SQL dialects support non-relational operations that rely on order. For example, Oracle and DB2 have proprietary operations on tree-structured data that produce results that are bags, not relations (i.e., contain duplicates) with significant tuple order, in violation of relational closure[19]. Because reordering loses information, nestability of operations, usually necessary for hierarchies, is no longer guaranteed.

As to attribute order in SQL, as one of the answer points out:
“In SQL, the ordering of attributes in tuples and headings is quite meaningful, and the consequences are all over the place. In FK declarations, correspondance of the respective referring and referred attributes is by ordinal position, not by name. Other cases are with UNIONs and EXCEPTs.”
and you can check the consequences in the given examples.

Note: Incidentally, can you detect a misconception due to confusion of relations with tables?

You now have the knowledge necessary to debunk the questions and the answers.



References

[1] Pascal, F., THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS.

[2] Pascal, F., Understanding Relations Part 1: Tables? So What?

[3] Pascal, F., The Conceptual-Logical Conflation and the Logical-Physical Confusion.

[4] McGoveran, D., LOGIC FOR SERIOUS DATABASE FOLK, forthcoming.

[5] Pascal, F., Relationships and the RDM Parts 1,2,3.


[6] Pascal, F., Object Orientation, Relational Database Design, Logical Validity and Semantic Correctness.

[7] Pascal, F., Natural, Programming, and Data Language.

[8] Pascal, F., Physical Independence Parts 1,2. 

[9] McGoveran, D., and Date, C.J., On View Updating.

[10] Pascal, F., Simplicity: Forgotten, Misunderstood, Underrated Relational Objective.
 
[11] Codd, E.F., A Relational Model of Data for Large Shared Data Banks, Commun. ACM 13(6): 377-387 (1970).
 
[12] Pascal, F., The Key to Relational Keys: A New Understanding.

[13] Pascal, F., Integrity Is Not Only Referential: DBMS vs Application Enforced Constraints.

[14] Pascal, F., What Is a True Relational System (and What It Is Not).


[15] Pascal, F., Database Design: What It Is and Isn't.

[16] Pascal, F., Don't Design Databases Without Foundation Knowledge and Conceptual Models.

[17] Pascal, F., Interpreting Codd: The 12 Rules.

[18] Pascal, F., SQL Sins.


[19] Pascal, F., What Relations Really Are and Why They Are Important.


Note: I will not publish or respond to anonymous comments. If you have something to say, stand behind it. Otherwise don't bother, it'll be ignored.





No comments:

Post a Comment