Monday, July 30, 2018

Lenin, Trotsky, Data Management, and the Tyranny of Knowledge and Reason -- Version 2

14 years ago I published an editorial with this title in response to a exchange triggered by one of my articles at The objective was to illustrate some of the damage done by the collapse of education to the IT industry, reflecting a trend in the society  as a whole. Things have only gotten worse since then, and for quite a while I considered a re-write. A recent Linkedin exchange provided the impetus, so here it is.

Monday, July 23, 2018

Industry Practice Is No Substitute for Foundation Knowledge

“A short time ago a colleague asked me where he could find a "Databases 101" guide for the non-technical professional. As it turns out, the internet is littered with information, and mis-information, regarding data and databases. This makes it difficult for someone entering the field of data and databases to understand the bigger picture. He wanted something to help make sense of data and databases, specifically relational versus NoSQL. So, that’s what I decided to work on. This is the start of my Databases 101 guide for the non-database professional; something to help anyone understand why the word “database” is an overloaded term. We use the term "database" to describe a great many things ... I use the terms database and database engine interchangeably in this article.”
                            --Databases 101,

To “understand the bigger picture”, and “make sense of data and databases” requires foundation knowledge. In its absence, efforts to educate defeat their own purpose, because they are grounded in the very misinformation they should be correcting. This is the mechanism by which misconceptions[1] are perpetuated and reinforced.

Sunday, July 15, 2018

Understanding Relations Part 3: Debunking Conventional Wisdom

(See Part 1 and Part 2)

“A common term used in database design is a "relational database" -- but a database relation is not the same thing and does not imply, as its name suggests, a relationship between tables. Rather, a database relation simply refers to an individual table in a relational database. In a relational database, the table is a relation because it stores the relation between data in its column-row format. The columns are the table's attributes, while the rows represent the data records. A single row is known as a tuple to database designers.”
“A relation, or table, in a relational database has certain properties.”

“First off, its name must be unique in the database, i.e. a database cannot contain multiple tables of the same name.”

“Next ... as with the table names, no attributes can have the same name.”

“Next, no tuple (or row) can be a duplicate. In practice, a database might actually contain duplicate rows, but there should be practices in place to avoid this, such as the use of unique primary keys (next up). Given that a tuple cannot be a duplicate, it follows that a relation must contain at least one attribute (or column) that identifies each tuple (or row) uniquely. This is usually the primary key. This primary key cannot be duplicated. This means that no tuple can have the same unique, primary key. The key cannot have a NULL value, which simply means that the value must be known.”

“Further, each cell, or field, must contain a single value. For example, you cannot enter something like "Tom Smith" and expect the database to understand that you have a first and last name; rather, the database will understand that the value of that cell is exactly what has been entered.”

“Finally, all attributes—or columns—must be of the same domain, meaning that they must have the same data type. You cannot mix a string and a number in a single cell.”

“All these properties, or constraints, serve to ensure data integrity, important to maintain the accuracy of data.”
  --Definition of Database Relation

It is easy to discern when explanations of relational features are not grounded in the formal foundations of the RDM[1], but in industry practices. Here are some further clarifications and corrections.

Wednesday, July 4, 2018

N-ary vs. Binary Relations: Anatomy of a Compound Misconception

Note: This is a re-write of an older post, to bring it into line with the McGoveran formalization and interpretation [1] of Codd's true RDM, which includes refinements, corrections, and extensions of his own.
“Codd has based his model on n-ary relations and that is the key mistake he has made; that leads to complex structure (absolutely not necessary) and situations where there are no values known and as a consequence the need of the concept we know too well -> the null pointers; binary relations (smallest possible) are sufficient to express any predicate/sententional formula and there is no possibility to have something like null; if a value is unknown then we do not know it thus it is not a fact for us thus it is not in our database.”
We suggest you test your foundation knowledge by figuring what's wrong with this picture before you proceed.
View My Stats