Sunday, August 2, 2015

Weekly Update

1. Quote of the Week
I am designing a mySQL database. I created tables and added extra columns for future use. Will it affect performance?
2. To Laugh or Cry?
Why you should never, ever, ever use MongoDB
3. Online Debunkings
Fixing 7 common database design errors
4. From the industry
Amazon's MySQL database challenger Aurora exits preview
5. And now for something completely different

Sunday, July 26, 2015

Interpreting Codd: Normalization

One of the most common and entrenched misconceptions about the RDM is the confusion of its sound theoretical foundation with "just a theory", implying lack of practicality. Not many realize that RDM includes several adaptations Codd made to set theory, to make it applicable to database practice.

In 1997, a computer science course at an Australian university posted on the Net a Q&A discussion of Codd's work. The answers were actually closer to the truth than what can be expected today and with a few corrections and clarifications they can be useful. The reader is encouraged to test self in assessing the answers before proceeding to my comments.
Q: When we normalise, we remove non-simple domains. However, in doing so, we create a lot of duplication. I was under the impression that we should try to minimise duplication as it is hard to keep data consistent and it also doesn't waste space. So what is so special about normalising?
A: The first advantage of normalising is that it renders all values atomic, thus simplifying all data structures: a huge advantage for storage and communication purposes. It is true that as a result of normalisation (in the sense of removing non-simple domains) duplication is introduced, but the process of normalisation is at the very heart of the relational model, i.e., atomic units in n-column homogeneous arrays. Therefore, a small element of redundant information is introduced for this advantage. Redundancy can be further reduced by other degrees of normalisation.
One of the most entrenched and frequent complaints is that the RDM is "just theory" that often clashes with database practical needs in the real world. Data professionals are mostly unaware that Codd made many adjustments to the abstract mathematical theory of relations, to adapt it to real world database management (that is why I insist on the distinction between relations and R-tables). One such adjustment are keys--PK's and FK's--that relations do not have.

Saturday, July 18, 2015

Weekly Update


  • New Appendix to paper #3: While working on my book, I collected all comments by readers and replies by me (edited) and David McGoveran and added them as Appendix B. It further clarifies some of the aspects of the proposed relational/2VL solution to missing data. Those who ordered the paper in 2014 and 2015 should email me for a copy.
  • Added to LINKS page: 
Why even the most intelligent software architects don't understand the Relational Model

1. Quotes of the Week
In 15-20 years from now: Information will stay only in XML (no more tuples, no more objects). Imperative languages as we know them today (Java, C, C++, C#) will be gone. We will program with some extension of XQuery, or in any case a declarative dataflow/workflow language specially --Daniela Florescu, 2010 Interview
Exactly 20 years ago I wrote this article: "Storing and Querying XML Data using an RDMBS". I curse myself every day for doing so. I should be damned by the fires of hell for ever opening my mouth and letting people believe that one can REASONABLY use SQL to query hierarchical, complex structures like XML or JSON.  NO, PEOPLE. YOU CAN NOT! --Daniela Florescu, 2015,
2. To Laugh or Cry?
SQL Will Inevitably Come To NoSQL Databases
3. Online Debunkings
Data Scientists: The talent crunch (that isn't)
4. Interesting
5. And now for something completely different

Thursday, July 9, 2015

The First Half of Database Science for Analysts

 My July blog @All Analytics:

Database Fundamentals: The First Half of Database Science for Analysts

One would expect “data scientists” to be keen on the dual scientific foundation of database management -- the relational data model (RDM) -- but they know little beyond “related tables” and, in fact, complain that more often than not data “do not fit” into them. Much of that is the result of poor education and an almost exclusive focus on software tool training. Even the analyst intent on acquiring foundation knowledge is more likely to be misled than enlightened by published information.

Please comment there, not here!


Sunday, July 5, 2015

The SQL and NoSQL Effects: Will They Ever Learn? UPDATED

UPDATE: I refer readers to Apache Cassandra … What Happened Next. Note that this was an optimal use case for NoSQL. Read it focused on the simplicity of the data model and particularly physical data independence relative to RDM. 

In Oracle and the NoSQL Effect, Robin Schumacher (RS), a former "data god" DBA and MySQL executive now working for a NoSQL vendor claims that Oracle’s recent fiscal Q4 miss--a fraction of what's to come--is due to its failure to recognize that
"... web apps ushered in a new model for development and distributed systems that ... [r]elational databases are fundamentally ill suited to handle ... Their master-slave architectures, methods for writing and reading data, and data distribution mechanisms simply cannot meet the key requirements of modern web, mobile and IoT applications. I tell you that not as an employee of a NoSQL company, but as a guy who has worked with RDBMS’s for over twenty-five years. In short, you simply can’t get there from here where relational technology is concerned, and that’s why NoSQL must be used for the applications we’re talking about.

Sunday, June 28, 2015

Weekly Update

1. Quote of the Week
My feeling is that the field of NoSQL was created EXACTLY so the data should not be normalized like in relational databases--which has the disadvantages that data needed for real time/online applications needed to be joined at runtime before being used by the application. Under the time constraints of an online system, this is unacceptable. Hence, application developers want to store persistently the data EXACTLY in the way application see it: pre-aggregated, potentially inconsistent, and potentially replicated. Bottom line, there is no "rule" of how you should store the data. Just look at your application needs. Not everyone has the same requirements as iTunes or Netflix, so you don't need to copy their design.
If this is a question for you... maybe you shouldn't be using a NoSQL database in the first place !? Why do you think you need one and good old relational databases aren't good for you? Just because it's "fashionable" ? My point is: if you knew exactly WHY you need a NoSQL database, you knew EXACTLY how to structure your data for it.
With consistency gone, whatever is left?

2. To Laugh or Cry?

Data Modeling in NoSQL
3. Online Debunkings 
4. Elsewhere 
5. Added to LINKS page:
6. And now for something completely different