Friday, August 31, 2012

Weekly DBDebunk News

1.
As most of my readers know, for various reasons I stay away from products and will occasionally touch on them only in a very specific context of data and relational fundamentals, in both the negative (e.g. false claims, violations, or errors) and positive (true and correct implementations) sense. For such purposes one needs the very rare combination of thorough product experience in actual practice and knowledge, understanding and appreciation of fundamentals. I happen to know a few such practitioners and I've asked them for input when necessary and they kindly agreed. So in the future expect to see some of their contributions.

Recently somebody requested in a LinkedIn database related group advice on how to select a DBMS. He received a few responses which included some important factors such as cost, support, development community and so on, but all were practically devoid of critical factors such as technical properties and, as one of my advisors pointed out, suitability of those to business needs. As another of my advisors correctly pointed out, this may have been OK several years ago, but not now, when the BigData/NoSQL fad is in full bloom and the circumstances seem to have regressed to the good old pre-relational days of IMS and CODASYL.

I have asked some of my advisors to offer their thoughts on DBMS selection and I am getting some interesting replies which provide evidence for the validity of my gut sense about these "DBMSs", which I will use in the appropriate contexts. It does look, though, like even with my limited knowledge of NoSQL products, the gist of my impression about them seem to have been spot on: "we don't know much about the data and what we logically need to do with it, except that it's a lot of it and we want to store it physically in such ways that will maximize the performance of those operations that we hope to discover we need."

Stay tuned.


2.
I have added a TRDBMS links section (bottom right). It currently includes Dataphor and Rel, both are implementations of Date & Darwen's Tutorial D (which is intended to be a full development language with intergrated relational capabilities). The former is an industrial strength open source TRDBMS used by the Database Consulting Group. The latter is currently an educational vehicle that is being developed into a product. Both are free, so you can get a feel of what for what a truly relational product looks like/can do relative to SQL and the new wave of products claiming to be improvements over it.

Have a good weekend.

Wednesday, August 29, 2012

To Laugh or Cry?

Help designing database table, dbForums

Quote of the Week

Q: normalization of receipt help pls

   DESCRIPTION|QTY|UNIT COST|TOTAL|TRANSACTION TYPE

   i need to convert it to 1st NF, 2nd NF , 3rd NF and BCNF
   pls help i already did a lot of searching in the net. i really cant seem to understand how normalization works.

A: (Expert): Look for E. F. Codd book.

--forums.databasejournal.com

Sunday, August 26, 2012

Schema, NoSQL and the Relational Model Part 3

I started in Part 1 with Haberman's justifiable doubts about "schema-less" NoSQL databases, using the MongoDB docubase as an example; and an exchange on the problematics of a "document data model" with Matt Rogish. In Part 2 I completed the exchange using the example of the W3C effort to define such a model for XML. I am now returning to schema and MongoDB DocBMS.

Thursday, August 23, 2012

Weekly DBDebunk News

1.
The last Laugh or Cry? item was a thread on NULLs at StackExchange.com. A reader commented that he could not tell whether I am "mocking (1) the question (2) the current answers or (3) the StackExchange forum" in the thread. So perhaps I should explain: Laugh or Cry? items are too taxing to debunk and pretty hopeless. But these items are not just entertainment (or pain, as it is for some of us), but also opportunities to test yourself on foundation knowledge. Lack of clarity on what is wrong indicates some gap that needs filling.

Now, of course, not everything is wrong in long threads, but it is precisely the ability to distinguish between sensible pronouncements and nonsense that foundation knowledge confers.

In fact, a thread I debunked at the old site is an excellent illustration of what I mean. Stay tuned.

UPDATE: One more reason for old posts: despite all the claims of "progress", some are evidence to the contrary: that fundamental problems are not resolved and that there is even regress in that progress.

2.
I last revised the papers in January 2011. The current revisions are major expansions/overhauls which I may well renumber version 1. The first paper I announced earlier, Business Modeling for Database Design, required a bit
more effort than I initially thought, so I have delayed the publication a bit for further refinements--it'll be worth the wait.

I am also testing the waters of self-publishing on Kindle --no small hassle--with the kind help of Charlie Clark (thanks, Charlie).

The seminars are undergoing a similar overhaul. I highly recommend them ( and I am not biased! :)). Go ahead, organize one.

3.
If you noticed, I am interleaving some posts from the old site with new ones. This is not just laziness or to fill space. I am soon going to drop the old site and I am selecting very carefully those that are too valuable to lose.

Besides, their revision does not take much less time and effort than writing new ones.

I have now added the original post date and dated the revised section to all of them, for proper context (thanks, Yiorgos, for suggesting it).

Have a nice weekend.

Wednesday, August 22, 2012

To Laugh or Cry?

Representative Table,The Daily WTF

Quote of the Week

The First Normal Form: Stipulates that no repeating groups or sets should be stored within a table. These similar groups should be stored in a separate table. To identify each unique column of these tables, a primary key should be established

The Second Normal Form: Data Redundancy should be non-existent. Foreign key constraints should be enforced.

The Third Normal Form: Every column in the table should be related and dependant on the primary key of the table. If this is not possible, the field should be stored in a new table with a new key.

The Fourth Normal Form: This one would probably exists somewhere in dream land - the elimination of independent relationships in a Relational Database.

The Fifth Normal Form: Exists in never-never land - it calls for a perfect segregation of logically related  many-to-many relationships.

So now you know something about relationships (that's what our whole Relational Database thing is about right?). But just keep in mind that as we increase and tighten our relationship enforcements, there would be a little trade off with performance.--George Alexander, A fundamental approach to Database design

Sunday, August 19, 2012

SQL Sins


Ovid: I've recently purchased the book DATABASE IN DEPTH by C. J. Date and I've been enjoying it very much.  It's been quite an eye-opener to discover that most of my gripes about "databases" were actually related to SQL.

To that end, I've accidentally started a lively discussion about the "merits" of SQL at Why SQL Sucks (with a little Perl to fix it). dbdebunk.com is getting mentioned quite a bit (and not always in a favorable light, I might add).  (Mine is the root post in the thread). Admittedly, I'm just learning about many of the issues involved. So despite years of working with databases, I may have some of the basics off a bit.

The reason I mention this is because that site is very high profile amongst Perl programmers and convincing the people at that site about the issues with SQL would go a long way to reaching a broad swath of the tech world. 


Friday, August 17, 2012

Weekly DBDebunk News

1. My first post for my new Data Fundamentals column at TechWeb's AllAnalytics:

Knowing What a Database Is

2. Last week I reported some Blogger bugs. One of them changes the URL's of the static pages when they are updated. The Seminars, Papers, LaughCry, Quotes and About are static pages, which means that I won't update them until Google fixes it, although I will continue to post updates in the blog.
If such an update scrolls before you read it you will have to go back to it to read it. I apologize for the inconvenience.

I am keeping a list of the pertinent posts and will update the pages as soon as the problem is solved.

3. Oops, forgot. Some of you may be experiencing broken links. I have a bunch of posts in draft mode in the pipe and and at one point I mistakenly posted all of them and had to take them off. Some of you are accessing those dead links. But not to worry: all of them will be posted in time, although they may have different URL.Again, sorry for the inconvenience, chuck it to birth pains.

Have a nice weekend.

Wednesday, August 15, 2012

To Laugh or Cry?

Database table and NULLs, stackexchange.com

Quote of the Week

In the database world, the raw physical data model is at the center of the universe, and queries freely assume intimate details of the data representation (indexes, statistics, metadata). This closed-world assumption and the resulting lack of abstraction have the pleasant effect of allowing the data to outlive the application. On the other hand, this makes it hard to evolve the underlying model independently from the queries over the model.

As the move to the cloud puts pressure on the closed-world assumption of the database, exposing naked data and relying on declarative magic becomes a liability rather than an asset. In the cloud, the roles are reversed, and objects should hide their private data representation, exposing it only via well-defined behavioral interfaces...

The world of database models is noun-based, talking about Customers, Orders, LineItems, etc. Once modelers have designed the data model correctly, they consider their job done.

In the realm of modelers, there is no notion of data abstraction that separates abstract properties of the model from the concrete details of the fully normalized realization in terms of tables with PK/FK (primary-key/foreign-key) relationships. --All Your Database Are Belong to Us, Erik Meijer

Schema, NoSQL and the Relational Model Part 2

Part 1 ended with my following comment to Matt Rogish, on the subject of a document data model:
As Codd realized, to do database management you must have some data model, period! You cannot do it without one. Indeed, a schema is based on it.

So in order to design a document database system of the kind you envision you must first define a document data model: structure, manipulation and integrity. What exactly is it?
Part 2 continues my response to Matt, using an exchange between me and Hugh Darwen to illustrate  what happened when an attempt was made by a W3C committee to come up with an XML document data model.

Sunday, August 12, 2012

Domain vs. Type, Class vs. Relation

What's wrong with last week's picture (update of August 2012)
"Our terminology is broken beyond repair. [Let me] point out some problems with Date's use of terminology, specifically in two cases.
  1. "type" = "domain": I fully understand why one might equate "type" and "domain", but ... in today's programming practice, "type" and "domain" are quite different. The word "type" is largely tied to system-level (or "physical"-level) definitions of data, while a "domain" is thought of as an abstract set of acceptable values.
  2. "class" != "relvar": In simple terms, the word "class" applies to a collection of values allowed by a predicate, regardless of whether such a collection could actually exist. Every set has a corresponding class, although a class may have no corresponding set ... in mathematical logic, a "relation" is a "class" (and trivially also a "set"), which contributes to confusion.
In modern programming parlance "class" is generally distinguished from "type" only in that "type" refers to "primitive" (system-defined) data definitions while "class" refers to higher-level (user-defined) data definitions. This distinction is almost arbitrary, and in some contexts, "type" and "class" are actually synonymous."
With respect to 1, well, yes, they are distinct, but not for the stated reason. With respect to 2, well, no insofar as "programming parlance" goes. The terminology introduced by Codd was explicitly intended to distinguish formal concepts from set theory and first order predicate logic from the terminology used in programming practice. 

What Is a Relational Table? URL Changed

I mentioned in my weekly news blog Blogger problems.

I just found out that the URL of a post that is taken offline, revised and its title changed, its URL changes. How stupid is that?

There is a similar problem with URL's of static pages: when they are updated, random numeric numbers are suffixed to them.

Until these problems are solved (don't hold your breath) I will not update my static pages and I will not be able to take posts offline, revise them and add UPDATED to their title.

Here are the changed URL of the post I revised:

What Is a Relational Table

Friday, August 10, 2012

Weekly DBDebunk News (UPDATED)

Welcome and thanks to readers, subscribers, well wishers and all expressions of interest and support.

1. Except for some Blogger bugs, the blog format seems to be working better than the old site, so at some point in the next few weeks I will drop the latter and probably shift the blog to the old domain. As far as I can tell, the current blogspot URL will continue to work. I will keep you posted.

2. Currently my top priority is to revise and expand my papers, seminars and to add to them. I am also experimenting with self-publishing on Kindle, which proves to be a hassle. This will keep me busy for a while, so if I  am not as responsive/active as I would like to be, thanks for your patience.

3. As a result of comments to two of my posts that identified some problems, I have taken them offline for revision and I will repost them as soon as I can.
Looks like I am a bit rusty after so many years in the wilderness. I apologize for the inconvenience. (UPDATE: My revised first post has been republished).

Please feel free to contact me at the address on the About page with submissions for debunking, questions, comments or whateve.

It's good to be back.
FP

Sunday, August 5, 2012

Schema, NoSQL and the Relational Model, Part 1

In What's Wrong with the Schema? Stephen Haberman is
...wondering why schema-less databases are so popular lately. Most any NoSQL store is schema-less. And while perhaps schema-less-ness is an integral part of NoSQL (e.g. most NoSQL databases are just opaque key/value stores), I would assert it’s an orthogonal concern, and that document-oriented databases, e.g. MongoDB, could arguably have a schema. However, MongoDB doesn’t just say it’s schema-less for technical reasons related to being a NoSQL store, it actually touts its lack of a schema as a benefit, claiming it is “agile” and offers “simplicity and power”. I find all of this confusing, as I actually want a schema.

Friday, August 3, 2012

To Laugh or Cry (UPDATED)

Readers of my old site may remember the "To Laugh or Cry?" feature. It was a series of links to material to which it was difficult to decide which of the two reactions would be justified.

Note: Weekly Quotes and To Laugh or Cry? items are archived on two pages accessible from the top menu, so that they can be perused after the posts scroll.  

Here is the first link initiating the series here.  

Comment On SQL Split, The Daily WTF

Wednesday, August 1, 2012

Quote of the Week

The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data.

To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background.

That might be close to how we store filled in paper forms. But it´s so unlike how the mind works.

There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen. --Ralf's Sudelbücher, Musings on relations - or: WinFS is not enough

Physical Data Independence

CB: In your writings, you have made the point that a data model is something that's logical (e.g. orthogonal to its actual physical implementation). Does that mean, for example, that a DBMS could be truly relational if, under the hood, it is physically built upon such traditional programming constructs as linked lists, pointers, and arrays? And even if so - are you aware of a better way to develop a TRDBMS without using such constructs?