Showing posts with label RA. Show all posts
Showing posts with label RA. Show all posts

Saturday, September 11, 2021

OBG: Data Warehouses Are Non-Relational Application Views



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), Judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references. You can acquire foundation knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, even better, organize one of our on-site SEMINARS, which can be customized to specific needs).


ON DATA WAREHOUSES

(originally posted 11/10/2001)

“It is dawning on me that data warehouse techniques (as advocated by Ralph Kimball) are a response to perceived SQL DBMS performance weaknesses and nothing more. Dimensional modeling is a physical implementation design to deal with what may already be out of date performance tests to back up said design. I agree that where data warehouse logical design deviates from the relational data model there will be trouble down the road. I can attest to the high cost involved in maintaining a data warehouse. I am now questioning three purported benefits of using the current popular data warehouse design techniques.

Physical DBMS designs like the star schema produce faster more predictable query results than a normalized one (I realize normalization is a strictly logical concept but it does appear to have its direct physical mappings in current DBMS systems). Well, I am about to find out. We will be performing some benchmarks. What really are the query times performed on our OLTP system vs. a Star schema for a few relevant reports.

Data warehouses offload query processing from the OLTP system. This is true, but may not be necessary. One needs to thoroughly analyze the traffic on the OLTP system to see if offloading is necessary. We are looking into simply replicating the database (or part of it) for reporting purposes. Replication is far simpler to maintain than a data warehouse.

Data warehouse designs are simpler to understand than a relational one, thereby query construction is easier. I think this is more due to the fact that designers have had a bad habit of throwing up on a wall a full ER chart of their systems. Saying in effect Look how great I am, no one will ever understand this!  Creating ER diagrams of subsystems to describe important characteristics of a database can also be simple to understand.

Item #3 brings up a question. What do you think of OLAP tools such a Microsoft s Analysis Services? We have discovered you can use the tool without redesigning your database. We have seen some pretty fast query times if we take the option of allowing the tool to store data extracted from the production database into its own proprietary format. The opposition to its adoption here is the learning curve to master the MDX query language."
 

Fabian Pascal: The fact is that so-called "dimensional modeling" is logical, not physical modeling. Kimball does not present his "techniques" as just performance-maximization and, what is more, they won’t necessarily yield better performance. He also seems to believe, erroneously, that star-schemas are fully normalized designs. The real solution to performance problems is true RDBMSs with better implementations, not ad-hoc logical designs. Like so many, Kimball simply does not understand relational technology and confuses levels of representation.

I consider warehouses a regression to the good old days of application-biased files -- which we discarded for application-neutral and DBMSs, because they did not prove cost-effective. They are application-specific views of databases that are not derived via relational algebra from relational databases, but are rather non-relational SQL tables. The industry has a long and profitable history of recycling relabeled old failures, witness XML and graph throwbacks to hierarchic databases.

Aside from being necessary for soundness, fully normalized relational databases are the easiest to understand if (1) one thoroughly knows and understands the segment of reality to be represented in the database -- the business, that is -- and (2) data fundamentals. Arbitrary designs such as star-schemas are cope-outs, due to poor knowledge and understanding of the latter.

I am not familiar with the product, but I am generally wary of what vendors do.


Note on re-publication: See also Data Warehouses and the Logical-Physical Confusion.

 

 

 

 

Saturday, July 10, 2021

Relational Misconceptions Part 2: RDM is Applied Theory



In Part 1 we showed (yet again) how even those with their heart in the right (relational) place can't help being affected by the common and entrenched industry misconceptions, in this case about relationships, relations and tables. More often than not authors exhibit the very misconceptions they try to debunk.

We left the author distinguishing sets (with unordered, unique elements) from tables (lists of ordered, possibly duplicate rows). 

Thursday, March 25, 2021

OBG: Don't Confuse Levels of Representation Part 1



Note: To demonstrate the correctness and stability due to a sound theoretical foundation relative to the industry's fad-driven "cookbook" practices, I am re-publishing as "Oldies But Goodies" material from the old DBDebunk.com (2000-06), so that you can judge for yourself how well my arguments hold up and whether the industry has progressed beyond the misconceptions those arguments were intended to dispel. I may revise, break into parts, and/or add comments and/or references.

This is an email exchange with readers in response to my article Normalization and Performance: Never the Twain Shall Meet.

Friday, March 19, 2021

Data Sublanguages vs. Programming Languages



Revised 3/20/21

I recently came across a review of Edsger Dijkstra's work, where the following comment on a book he co-authored (referred to as D&S) raised my debunking antennae:

“... in general computer people seem to have a penchant for whipping up homebrew logics ... D&S is not the only example ... See E.F. Codd’s Relational Calculus, an obvious mess.”
--Maarten van Emden, A Bridge too Far: E.W. Dijkstra and Logic 

Having recently argued that "Codd was wrong" and "You're teaching [his] gospel" Betray Lack of Foundation Knowledge, my suspicion should hardly surprise. Besides, criticism of Dijkstra is a very tall order in itself, particularly in the context of disputes among logicians). As a reader asked, "What’s so obviously messy in Codd’s Relational Calculus?". Answer:

Monday, February 1, 2021

Normalization -- Will They Ever Learn?



“To Normalize or not to Normalize? that really isn't a question. few things to consider:
Normalization is supposed to protect from data anomalies, but not prevent us from using data encapsulation is the magic trick that allows you to do what you want without breaking rules.what are your experiences with normalization?”
                                                                --LinkedIn


This is a question that at this time need -- and should -- not be asked anymore, and the fact that it still is is one confirmation -- among many -- that there is no progress in data management. According to the current understanding of the RDM:

  • Database relations are both normalized (in 1NF) and fully normalized (in 5NF) by definition, otherwise they are not relations and the relational algebra (RA) does not work;
  • Adherence to three database design principles produces 1NF and 5NF relational databases;
  • Consequently, there should not be such a thing as "doing" normalization (to 1NF) and further normalization (to 5NF) except to repair databases that are non-relational due to failure to adhere to the principles.


Note: The three design principles are fundamental to SST/FOPL foundation of the RDM, but were never understood even by relational proponents. I do not know what encapsulation has to do with this.

Saturday, October 17, 2020

Understanding Codd's 12 Rules for RDBMS



In response to an online publication of a book appendix regurgitating Codd's 12 famous rules (some of which were, typically, incorrect[1]) I posted earlier a clarification of the rules. This is a revision thereof for better consistency with the new understanding of the RDM based on McGoveran's re-interpretation, extension and formalization[2] of Codd's work.

Saturday, September 26, 2020

TYFK: Nothing to Do With Relational



Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

 

 “The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data. To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background. That might be close to how we store filled in paper forms. But it´s so unlike how the mind works. There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen.” --Weblogs.asp.net

The lack of foundation knowledge exposed by the above paragraph is so complete that its claims are practically upside down and backwards.

Fundamentals

As we have demonstrated, in mathematical set theory a relation (set) is a subset of a cross-product of domains (sets). In other words, it is a set that is a relationship among sets. Being abstract (i.e., having no real world meaning), the values of mathematical relations can be arbitrary.

The RDM is an application of simple set theory expressible in first order predicate logic (SST/FOPL) to database management: a relational database represents a conceptual model of some reality, namely (facts about) a multigroup in the real world -- a collection of related entity groups -- each database relation representing one such group; a database is also a set of related relations. The values in database relations (i.e., the data) are, thus, not arbitrary, but must be consistent with the conceptual model: relations and the database as a whole are semantically constrainted to be so consistent: (1) individual properties of entities and (2) collective properties of (a) groups (i.e., relationships among entities within groups), and (b) the multigroup (i.e., relationships among groups).

A primary key (PK) represents names given in the real world to entities of a given type, and the corresponding PK constraint (uniqueness) enforces consistency of a relation with the distinguishability of those entities in the real world, the facts about which it represents. These are not RDM artifacts, but rather part of the adaptation of SST/FOPL to database management.

For the primary advantage of the RDM -- guaranteed correctness of query results (i.e., inferences made from the database) -- to materialize, logical database design must adhere to three core principles which, jointly, imply fully normalized relations (5NF).
In fact, in RDM relations are in 5NF by definition, otherwise they are not relations -- relational algebra (RA) operations lose information and all bets are off.

The RA is the manipulative component of the RDM -- a collection of primitive and derived set operations on relations that describe relationships among relations. For example, the join operation r1 JOIN r2 describes a relationship between r1 and r2 relation, the result itself a relation. Note that since every result of a RA operation on even one relation is always a relation and still describes a relationship -- between the "input" and "output" relations.

A data model -- and, industry claims notwithstanding, the only one satisfying Codd's definition that has been formalized is the RDM -- is by nature focused on data. However, the RDM supports physical independence (PI) and, thus, not concerned with how data is physically stored and accessed. The notion of "files stored in paper form" is an example of the common and entrenched logical-physical confusion (LPC) due to failure to understand the distinction between a logical relation and its tabular visualization on a physical medium, induced/reinforced by the industry's "direct image" implementation of SQL DBMSs.

Conclusion


We rephrase the above paragraph as follows:

“The relational algebra describes relationships among relations (sets). Primary keys are one of the adaptations of the SST/FOPL for database management: a PK constraint -- uniqueness -- represents formally in the database a within-group relationship among all its entities.

Mandatory adherence to three core design principles jointly imply full normalization, which is necessary to guarantees correctness of query results. True RDBMSs:

  • Implement the RA for logical data retrieval independent of how the data is physically stored and accessed. SQL DBMSs notwithstanding, vendors are free to store data whichever way they want as long as they don't expose it to users in applications.
  • Enforce relational constraints that are formal database representations of relationships in the conceptual model represented by the database.”

 The "brain" stuff is sheer nonsense.





Friday, August 28, 2020

TYFK: Denormalization Does Not Have Fundamentals



Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).
 
  ““Main Question: How do we trade-off while doing denormalization? 
  • Sub-question 1: the standard to implement
- Do we always have to denormalize a model? For what kind of project must we use denormalization techniques while others may not?
- Since denormalization has its gains and losses, how well should we denormalize a data model? Perhaps, the more complete we denormalize, the more complex, uncertain and poor the situation will be.
  • Sub-question 2: the characteristics of normalization
-Does denormalization have several levels/forms the same as that of normalization? For instance: 1DNF, 2DNF...
- Given we can denormalize a data model, it may never be restored to the original one because to do normalization, one can have many ways while to build a data model, you can have multiple choices in determining entities, attributes, etc.””

In Part 1 we discuss the relevant fundamentals in which we will ground the debunking in Part 2.

Friday, August 7, 2020

OBG: Data Models and Physical Independence




Note: To appreciate the stability of a sound foundation vs the industry's fad-driven cookbook practice, I am re-publishing some of the articles and reader exchanges from the old DBDebunk.com (2000-06), giving you the opportunity to judge for yourself how well my claims/arguments hold up and whether the industry has progressed at all. I am adding comments on re-publication where necessary. Long pieces are broken into smaller parts for fast reading.

From "Little Relationship to Relational" originally posted on March 29, 2001.

 
“E.F. ("Ted") Codd conceived of his relational model for databases while working at IBM in 1969. Codd's approach took a clue from first-order predicate logic, the basis of a large number of other mathematical systems and presented itself [sic] in terms of set theory, leaving the physical definition of the data undefined and implementation dependent. In June of 1970, Codd laid down much of his extensive groundwork for the model in his article, "A Relational Model of Data for Large Shared Data Banks" published in the Communications of the ACM, a highly regarded professional journal published by the Association for Computing Machinery. Buoyed by an intense reaction against the ad hoc data models offered by the physically oriented mainframe databases, Codd's rigid separation of the logical model, with its rigorous mathematical underpinnings, from the less elegant realities of hardware engineering was revolutionary in its day. Codd and his relational ideas blazed across the academic computing landscape over the next few years.”

Sunday, June 28, 2020

TYFK: Misconceptions About the Relational Model



“The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:
  • Data is stored in tables called relations.
  • Relations can be normalized, [in which case] values saved are atomic values.
  • Each row in a relation contains a unique value.
  • Each column in a relation contains values from a same domain.”

Each "Test Your Foundation Knowledge" post presents one or more misconceptions about data fundamentals. To test your knowledge, first try to detect them, then proceed to read our debunking, which is based on the current understanding of the RDM, distinct from whatever has passed for it in the industry to date. If there isn't a match, you can acquire the knowledge by checking out our POSTS, BOOKS, PAPERS, LINKS (or, better, organize one of our on-site SEMINARS, which can be customized to specific needs).

Thursday, May 28, 2020

No Such Thing As "Current Relational Data Models"



“... the concept of a state group is indeed a missing modeling concept in relational/current data models...”

Thus in a LinkedIn exchange. I don't know what a "state group" is, but I spent almost six decades debunking the misuses of data model in general and the abuses of the RDM in particular and I smell them from miles away. While the time when lack of foundation knowledge shocked me is long gone, practitioners' total unawareness of and indifference to it, and poor reasoning in a field founded on logic never ceases to amaze me.

What exactly are "relational/current data models"?

Monday, March 23, 2020

TYFK: How (Not) to Compare NoSQL Systems and RDBMSs




Note: About TYFK posts (Test Your Foundation Knowledge) see the post insert below.
“But if you still want to compare NOSQL databases with RDBMS, they primarily vary in
1. "normalization" where RDBMS contains normalized (upto certain degree) data and NOSQL based database contains non-normalized data;
2. RDBMS based databases are (I MUST say, generally and it isn't a criteria) fully ACID compliant while NOSQL databases are partially ACID compliant.
3. RDBMS are much slower and difficult to scale while NOSQL databases are much faster and easily scalable.
4. RDBMS normalization was very useful 50 years ago when cost of disk and memory was high, and computation power was limited. With the revolution in computing power, cheapest disk and memory availability has made RDBMS normalization a matter of joke - many people do not really understand why they need to normalize data in today's time.”
First try to detect the misconceptions, then check against our debunking. If there isn't a match, you can acquire the necessary foundation knowledge in our POSTS, BOOKS, PAPERS, LINKS or, better, organize one of our on-site SEMINARS, which can be customized to specific needs.

Note: In what follows RDBMS refers to a truly relational DBMS (of which currently aren't any), not to be confused with a SQL DBMS.

Friday, December 20, 2019

The RDM and Model Stability





“3rd normal form data models in data warehousing efforts struggle when changes impact parent child relationships. These impacts cause cascading changes to the data model, the queries, and the loading processes. [For example:]
  • There are bank accounts
  • Each account belongs to exactly one customer
  • A customer can have more than one account
The bank introduces a new product: joint accounts, which means an account can now have more than one owner. It is clear that the 3NF model has to be extended in order to keep this new information; the data vault models seems to be able to fulfill the new requirement.

Some banks propose joint accounts, some don’t, therefore some use M:N relation between client and accounts and others 1:N. A model which is good for any possible case is actually awful model because it describes nothing: by looking at this model you can’t say if joint accounts exist among bank's products.”

--Data Vault and Model (in)Stability

Data warehousing/vault[1] are a red herring here -- the real issue is data independence. Some corrections and clarifications first:

  • Normal forms do not pertain to the data model itself -- the RDM -- but to relations in logical models created using strictly the RDM[2].
  • 3NF is insufficient -- relations are in 5NF by definition, otherwise correctness is not guaranteed[3].
  • The RDM was introduced as a database representation superior to old directed graph -- hierarchic and network (CODASYL) -- systems for conceptual models focused on relationships among entity groups, rather than among individual entities[4]. Graph database representation (nodes and edges) corresponds to a worldview at the conceptual level of parents-children (network) relationships, of which parent-children (hierarchy) is a special case. The relational representation (relations) corresponds to M:N relationships among entity groups, of which M:1 is a special case[5].

Note: Correctness -- logical and semantic[6] -- requires adherence to three principles of database design that jointly imply 5NF[7].

Saturday, October 26, 2019

Data Sublanguage Part 4: Conclusion



In Parts 1, 2, and 3  we showed that when the RDM is the data model:
  • A data sublanguage is short for data manipulation language (DML) that combines (1) a relationally complete retrieval component (i.e., that expresses the RA) with (2) a component that expresses updates as relation transformations;
  • A DBMS language is a careful combination, for practical purposes, of the data sublanguage with several sublanguages, each of which expresses a data management function (e.g., data definition, transactions, concurrency, authorizations) -- that are not relational, but are consistent with the RDM, and must not include syntactic elements that are at odds with, or subvert those of the DML.
Note: The RDM is the only data model consistent with Codd's definition that has been formalized [1].

We are now in a position to debunk the two quotes that triggered this series.


Friday, October 11, 2019

Data Sublanguage Part 3: DBMS Language




Note: 10/10/2019 significantly revised Part 2, which requires a re-read.

A formal data model consists of structure, integrity and manipulation[1,2] and so requires (1) a language that expresses data manipulation (retrievals and updates) augmented with (2) a metalanguage used to define the model's structural elements. As we showed in Part 1  and Part 2, when the data model is the RDM:

  • Data sublanguage is short for a relationally complete data manipulation sublanguage (DML) that expresses  retrievals and updates, the latter correctly understood as set-theoretic relation transformations.
  • A data definition sublanguage (DDL) is a metalanguage for DML that is outside the theory but consistent with the RDM and at least as powerful expressively as the DML (e.g., a very carefully restricted SOL to avoid self-referencing).
  • The DML and the DDL can, for practical purposes, be carefully unified into what Codd called a "comprehensive data sublanguage", but we prefer DBMS language to avoid confusion.
By carefully we mean that because only the DML is, strictly speaking, based on relational theory, the DDL must be consistent with, but differentiated from it, such that the two can't be mixed in the same expression in a way that the former subverts the latter. A DML expression can be referenced as a sub-expression by a DDL expression (e.g., as in view definitions), but only if it contains strictly retrievals (e.g., SELECT) and no updates (e.g., INSERT, UPDATE, DELETE).

Note: Data definition and manipulation are possible without a DBMS. However:

“With a relational catalog, definition can be performed via the RA, which requires physical implementation to be determined exclusively by the catalog (behind the scenes as it were) -- a kind of skeletal, primitive, or rudimentary DBMS. This is why Codd created a relational catalog that contains a description of the database and could be managed using RA-based DML. It works well unless one is allowed to mix DDL (metalanguage) with DML (language) in the same expression. Otherwise put, the database can be read to modify the catalog, but not vice-versa (as far as the DML is concerned, the catalog that describes the database does not exist).

But with a data model that, unlike the RDM, does not define a catalog such that the same language can be used for both database and it, a rudimentary DBMS must provide a workaround, and if the model is computationally complete (like CODASYL was), there must limits on how "active" the catalog is to prevent users from writing self-referencing expressions that cannot be automatically implemented because they may corrupt the database (same as would mixing data sublanguage and host language). This is one reason some of the pre-RDM directed graph DBMSs had limited notions of catalog that often required completely separate facilities to maintain.”
                                                 --David McGoveran

The DML and the DDL express two core data management functions centralized in the DBMS. When the data model is the RDM, only data management functions are permitted to access the data. At the end of Part 2 we alluded to other such functions that, like data definition, are outside relational theory, but must be consistent with the RDM.

Friday, September 27, 2019

Data Sublanguage Part 2: Data Manipulation and Definition




Revised 10/10/2019.

In Part 1 we showed that Codd intended in 1969 to base the RDM on axiomatic set theory (AST) and second order logic (SOL) to accommodate relation-valued domains (RVD) (i.e., sets of sets), but that for the benefit of relational advantages and to avoid SOL problems he had to trade off the expressive power of AST/SOL for the simple set theory (SST) of proper sets (i.e., relations in normal form) expressible in first order predicate logic (FOPL) and, thus, computational for relational completeness[1]. He retained the power of the former for applications by hosting a relationally complete FOPL-based language expressing the RA in computationally complete programming languages (CCL).

We also alerted to an important, but unnoticed detail: data sublanguage appeared in the 1970 paper -- in 1969 Codd referred to retrieval sublanguage. This can be understood only with reference to the theoretical foundation of the RDM.


Sunday, September 22, 2019

Data Sublanguage Part 1: Relational vs. Computational Completeness




Note: I have revised the "Logical Access, Data Sublanguage, Kinds of Relations, Database Redundancy, and Consistency" paper in the "Understanding the Real RDM" series" (available from the PAPERS page) for consistency with this post.

“Recently I have read that SQL is actually a data sublanguage and not a programming language like C++ or Java or C# ... The answers ... have the pattern of "No, it is not. Because it's not Turing complete.", etc, etc. ... I am a bit confused, because since you can develop things through SQL, I thought it is similar to other programming languages ... I am curious about knowing why exactly is SQL not a programming language? Which features does it lack? (I know it can't do loops, but what else more?)”
--StackOverflow.com
“The SQL operators were meant to implement the relational algebra as proposed by Dr. Ted Codd. Unfortunately Dr. Codd based some of his ideas on a "extended set theory", which was an idea formulated and described in a 1977 paper by D. L. Childs ... But Childs’ extensions were not ideally suited, which is explained in quite some detail in [a] book ... by Professor Gary Sherman & Robin Bloor [who] argue that mainstream Zermelo-Fraenkel set theory (Cantor), would have been a better starting point. One key issue is that sets should be able to be sets of sets.”
--Dataversity.net

The concept of a sublanguge cannot be understood without foundation knowledge and familiarity with the history of the database management field, both lacking in the industry.

Sunday, April 14, 2019

Understanding Data Modeling Part 1: Models, Models Everywhere, Nor Any Time to Think




“... I needed to know what the constituent parts of data models really are. Across the board, all platforms, all models etc. Is there anything similar to atoms and the (chemical) bonds that enables the formation of molecules? My concerns were twofold ... I wanted a simple, DIY-style, metadata repository for storing 3-level data models -- what would the meta model of such a thing look like? -- [where] atomicity is of essence ... I took a tour (again) in the Data Modeling zone, trying to deconstruct the absolutely essential metadata, which data modelers cannot do without.”
--Thomas Friesendal, The Atoms and Molecules of Data Models, Dataversity.com

All data models? 3-level data models? Platforms? Hhhmmmm!

Wednesday, March 27, 2019

Graph Databases: They Who Forget the Past...




Out of the plethora of misconceptions common in the industry[1], quite a few are squeezed into this paragraph:
“The relational databases that emerged in the ’80s are efficient at storing and analyzing tabular data but their underlying data model makes it difficult to connect data scattered across multiple tables. The graph databases we’ve seen emerge in the recent years are designed for this purpose. Their data model is particularly well-suited to store and to organize data where connections are as important as individual data points. Connections are stored and indexed as first-class citizens, making it an interesting model for investigations in which you need to connect the dots. In this post, we review three common fraud schemes and see how a graph approach can help investigators defeat them.
--AnalyticBridge.DataScienceCentral.com

Relational databases did not emerge in the 80s (SQL DBMSs did);
  • There is no "tabular data" (the relational data structure is the relation, which can be visualized as a table on a physical medium[2], and SQL tables are not relations);
  • Analysis is not a DBMS, but an application function (while database queries, as deductions, are an important aspect of analysis, and computational functions can be added to the data sublanguage (as in SQL), the primary function of a DBMS is data management)[3];
  • A data model has nothing to do with storage (storage and access methods are part of physical implementation, which determines efficiency/performance[4]).

Here, however, we will focus on the current revival (rather than emergence) of graph DBMSs claimed superior -- without any evidence or qualifications -- to SQL DBMSs (not relational, which do not exist) that purportedly "make it difficult to connect data scattered across multiple tables". This is a typical example of how lack of foundation knowledge and of familiarity with the history of the field inhibit understanding and progress[5].


Saturday, January 19, 2019

Data and Meaning Part 4: Query and Result Correctness




As we have seen in Parts 1, 2, and 3, the RDM is a formal theory adapted and applied to database management: database relations (1) preserve the formal properties of mathematical relations, but also (2) have interpretations -- carry a real world meaning assigned by a conceptual model: facts about entities, entity groups, and multigroups (i.e., their properties, some of which are relationships, specified by business rules (BR)). A relation is formally in 5NF and constrained for semantic consistency (i.e., to represent facts about an entity group).
“When we create specific domains, relations, and attributes we are constraining (restricting) an abstract logical system to a specific interpretation (meaning). Seen the other way around, an interpretation of the logical system is a representation of a specific segment of the world, and that is exactly the purpose of database design. For example, an attribute name created by the designer is assigned meaning intended by the modeler as representing an entity property, which is the very meaning of semantics. That is why full normalization cannot be achieved or assessed without reference to some conceptual model -- what attribute names mean, and how they are related to each other (i.e., their dependencies), and so on.” --David McGoveran
Yet requesting and giving design advice without a conceptual model is routine in the industry[1]. What is more, most practitioners are oblivious to the implications for correctness of queries and results[2].
View My Stats