Friday, August 31, 2012

Weekly DBDebunk News



1.
As most of my readers know, for various reasons I stay away from products and will occasionally touch on them only in a very specific context of data and relational fundamentals, in both the negative (e.g. false claims, violations, or errors) and positive (true and correct implementations) sense. For such purposes one needs the very rare combination of thorough product experience in actual practice and knowledge, understanding and appreciation of fundamentals. I happen to know a few such practitioners and I've asked them for input when necessary and they kindly agreed. So in the future expect to see some of their contributions.

Recently somebody requested in a LinkedIn database related group advice on how to select a DBMS. He received a few responses which included some important factors such as cost, support, development community and so on, but all were practically devoid of critical factors such as technical properties and, as one of my advisors pointed out, suitability of those to business needs. As another of my advisors correctly pointed out, this may have been OK several years ago, but not now, when the BigData/NoSQL fad is in full bloom and the circumstances seem to have regressed to the good old pre-relational days of IMS and CODASYL.

I have asked some of my advisors to offer their thoughts on DBMS selection and I am getting some interesting replies which provide evidence for the validity of my gut sense about these "DBMSs", which I will use in the appropriate contexts. It does look, though, like even with my limited knowledge of NoSQL products, the gist of my impression about them seem to have been spot on: "we don't know much about the data and what we logically need to do with it, except that it's a lot of it and we want to store it physically in such ways that will maximize the performance of those operations that we hope to discover we need."

Stay tuned.


2.
I have added a TRDBMS links section (bottom right). It currently includes Dataphor and Rel, both are implementations of Date & Darwen's Tutorial D (which is intended to be a full development language with intergrated relational capabilities). The former is an industrial strength open source TRDBMS used by the Database Consulting Group. The latter is currently an educational vehicle that is being developed into a product. Both are free, so you can get a feel of what for what a truly relational product looks like/can do relative to SQL and the new wave of products claiming to be improvements over it.

Have a good weekend.

Wednesday, August 29, 2012

To Laugh or Cry?



Help designing database table, dbForums

Quote of the Week



Q: normalization of receipt help pls

   DESCRIPTION|QTY|UNIT COST|TOTAL|TRANSACTION TYPE

   i need to convert it to 1st NF, 2nd NF , 3rd NF and BCNF
   pls help i already did a lot of searching in the net. i really cant seem to understand how normalization works.

A: (Expert): Look for E. F. Codd book.

--forums.databasejournal.com

Sunday, August 26, 2012

Schema, NoSQL and the Relational Model Part 3



I started in Part 1 with Haberman's justifiable doubts about "schema-less" NoSQL databases, using the MongoDB docubase as an example; and an exchange on the problematics of a "document data model" with Matt Rogish. In Part 2 I completed the exchange using the example of the W3C effort to define such a model for XML. I am now returning to schema and MongoDB DocBMS.

Thursday, August 23, 2012

Weekly DBDebunk News



1.
The last Laugh or Cry? item was a thread on NULLs at StackExchange.com. A reader commented that he could not tell whether I am "mocking (1) the question (2) the current answers or (3) the StackExchange forum" in the thread. So perhaps I should explain: Laugh or Cry? items are too taxing to debunk and pretty hopeless. But these items are not just entertainment (or pain, as it is for some of us), but also opportunities to test yourself on foundation knowledge. Lack of clarity on what is wrong indicates some gap that needs filling.

Now, of course, not everything is wrong in long threads, but it is precisely the ability to distinguish between sensible pronouncements and nonsense that foundation knowledge confers.

In fact, a thread I debunked at the old site is an excellent illustration of what I mean. Stay tuned.

UPDATE: One more reason for old posts: despite all the claims of "progress", some are evidence to the contrary: that fundamental problems are not resolved and that there is even regress in that progress.

2.
I last revised the papers in January 2011. The current revisions are major expansions/overhauls which I may well renumber version 1. The first paper I announced earlier, Business Modeling for Database Design, required a bit
more effort than I initially thought, so I have delayed the publication a bit for further refinements--it'll be worth the wait.

I am also testing the waters of self-publishing on Kindle --no small hassle--with the kind help of Charlie Clark (thanks, Charlie).

The seminars are undergoing a similar overhaul. I highly recommend them ( and I am not biased! :)). Go ahead, organize one.

3.
If you noticed, I am interleaving some posts from the old site with new ones. This is not just laziness or to fill space. I am soon going to drop the old site and I am selecting very carefully those that are too valuable to lose.

Besides, their revision does not take much less time and effort than writing new ones.

I have now added the original post date and dated the revised section to all of them, for proper context (thanks, Yiorgos, for suggesting it).

Have a nice weekend.

Wednesday, August 22, 2012

To Laugh or Cry?



Representative Table,The Daily WTF

Quote of the Week



The First Normal Form: Stipulates that no repeating groups or sets should be stored within a table. These similar groups should be stored in a separate table. To identify each unique column of these tables, a primary key should be established

The Second Normal Form: Data Redundancy should be non-existent. Foreign key constraints should be enforced.

The Third Normal Form: Every column in the table should be related and dependant on the primary key of the table. If this is not possible, the field should be stored in a new table with a new key.

The Fourth Normal Form: This one would probably exists somewhere in dream land - the elimination of independent relationships in a Relational Database.

The Fifth Normal Form: Exists in never-never land - it calls for a perfect segregation of logically related  many-to-many relationships.

So now you know something about relationships (that's what our whole Relational Database thing is about right?). But just keep in mind that as we increase and tighten our relationship enforcements, there would be a little trade off with performance.--George Alexander, A fundamental approach to Database design

Sunday, August 19, 2012

SQL Sins




Ovid: I've recently purchased the book DATABASE IN DEPTH by C. J. Date and I've been enjoying it very much.  It's been quite an eye-opener to discover that most of my gripes about "databases" were actually related to SQL.

To that end, I've accidentally started a lively discussion about the "merits" of SQL at Why SQL Sucks (with a little Perl to fix it). dbdebunk.com is getting mentioned quite a bit (and not always in a favorable light, I might add).  (Mine is the root post in the thread). Admittedly, I'm just learning about many of the issues involved. So despite years of working with databases, I may have some of the basics off a bit.

The reason I mention this is because that site is very high profile amongst Perl programmers and convincing the people at that site about the issues with SQL would go a long way to reaching a broad swath of the tech world. 


Friday, August 17, 2012

Weekly DBDebunk News



1. My first post for my new Data Fundamentals column at TechWeb's AllAnalytics:

Knowing What a Database Is

2. Last week I reported some Blogger bugs. One of them changes the URL's of the static pages when they are updated. The Seminars, Papers, LaughCry, Quotes and About are static pages, which means that I won't update them until Google fixes it, although I will continue to post updates in the blog.
If such an update scrolls before you read it you will have to go back to it to read it. I apologize for the inconvenience.

I am keeping a list of the pertinent posts and will update the pages as soon as the problem is solved.

3. Oops, forgot. Some of you may be experiencing broken links. I have a bunch of posts in draft mode in the pipe and and at one point I mistakenly posted all of them and had to take them off. Some of you are accessing those dead links. But not to worry: all of them will be posted in time, although they may have different URL.Again, sorry for the inconvenience, chuck it to birth pains.

Have a nice weekend.

Wednesday, August 15, 2012

To Laugh or Cry?



Database table and NULLs, stackexchange.com

Quote of the Week



In the database world, the raw physical data model is at the center of the universe, and queries freely assume intimate details of the data representation (indexes, statistics, metadata). This closed-world assumption and the resulting lack of abstraction have the pleasant effect of allowing the data to outlive the application. On the other hand, this makes it hard to evolve the underlying model independently from the queries over the model.

As the move to the cloud puts pressure on the closed-world assumption of the database, exposing naked data and relying on declarative magic becomes a liability rather than an asset. In the cloud, the roles are reversed, and objects should hide their private data representation, exposing it only via well-defined behavioral interfaces...

The world of database models is noun-based, talking about Customers, Orders, LineItems, etc. Once modelers have designed the data model correctly, they consider their job done.

In the realm of modelers, there is no notion of data abstraction that separates abstract properties of the model from the concrete details of the fully normalized realization in terms of tables with PK/FK (primary-key/foreign-key) relationships. --All Your Database Are Belong to Us, Erik Meijer

Schema, NoSQL and the Relational Model Part 2



Part 1 ended with my following comment to Matt Rogish, on the subject of a document data model:
As Codd realized, to do database management you must have some data model, period! You cannot do it without one. Indeed, a schema is based on it.

So in order to design a document database system of the kind you envision you must first define a document data model: structure, manipulation and integrity. What exactly is it?
Part 2 continues my response to Matt, using an exchange between me and Hugh Darwen to illustrate  what happened when an attempt was made by a W3C committee to come up with an XML document data model.

Friday, August 10, 2012

Weekly DBDebunk News (UPDATED)



Welcome and thanks to readers, subscribers, well wishers and all expressions of interest and support.

1. Except for some Blogger bugs, the blog format seems to be working better than the old site, so at some point in the next few weeks I will drop the latter and probably shift the blog to the old domain. As far as I can tell, the current blogspot URL will continue to work. I will keep you posted.

2. Currently my top priority is to revise and expand my papers, seminars and to add to them. I am also experimenting with self-publishing on Kindle, which proves to be a hassle. This will keep me busy for a while, so if I  am not as responsive/active as I would like to be, thanks for your patience.

3. As a result of comments to two of my posts that identified some problems, I have taken them offline for revision and I will repost them as soon as I can.
Looks like I am a bit rusty after so many years in the wilderness. I apologize for the inconvenience. (UPDATE: My revised first post has been republished).

Please feel free to contact me at the address on the About page with submissions for debunking, questions, comments or whateve.

It's good to be back.
FP

Sunday, August 5, 2012

Schema, NoSQL and the Relational Model, Part 1



In What's Wrong with the Schema? Stephen Haberman is
...wondering why schema-less databases are so popular lately. Most any NoSQL store is schema-less. And while perhaps schema-less-ness is an integral part of NoSQL (e.g. most NoSQL databases are just opaque key/value stores), I would assert it’s an orthogonal concern, and that document-oriented databases, e.g. MongoDB, could arguably have a schema. However, MongoDB doesn’t just say it’s schema-less for technical reasons related to being a NoSQL store, it actually touts its lack of a schema as a benefit, claiming it is “agile” and offers “simplicity and power”. I find all of this confusing, as I actually want a schema.

Friday, August 3, 2012

To Laugh or Cry (UPDATED)



Readers of my old site may remember the "To Laugh or Cry?" feature. It was a series of links to material to which it was difficult to decide which of the two reactions would be justified.

Note: Weekly Quotes and To Laugh or Cry? items are archived on two pages accessible from the top menu, so that they can be perused after the posts scroll.  

Here is the first link initiating the series here.  

Comment On SQL Split, The Daily WTF

Wednesday, August 1, 2012

Quote of the Week



The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data.

To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background.

That might be close to how we store filled in paper forms. But it´s so unlike how the mind works.

There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen. --Ralf's Sudelbücher, Musings on relations - or: WinFS is not enough

Re-write




See: Physical Independence Part 1: Don't Mix Model with Implementation



View My Stats