Saturday, December 29, 2012

Site Update



1.
The new Quote of the Week posted on the QUOTES page is a complement to the previous quote and is by the same author.

2.
Both the quote and the new To Laugh or Cry? item posted to the LAUGH/CRY? page have to do with the author referenced in my prvious post, The Clouding Syndrome. Does he really think that titling his article the way he did elevates him to Codd's level?

My friend David McGoveran has some comments forthcoming, but in the meantime perhaps this is closer to reality.

3.
Added to FP ONLINE page:
  • My latest All Analytics column.
  • A LinkedIn exchange in which I participated (as of this writing my comments are not posted yet). One of the participants recommended the very same article by Meijer mentioned in 1.
4.
For quite a while I just could not display a LinkedIn Share button after each post, next to the other buttons provided by Blogger, no matter what I did and neither Google, nor LinkedIn helped. I was finally able to display it at the bottom of posts, but not exactly where I wanted. I am working on relocating it, but in the meantime it is usable where it is.



Tuesday, December 25, 2012

The Clouding Syndrome (REVISED)



Over the years I got enormous and nasty flak for my arguments about the sorry state of foundation knowledge in the industry, the contamination of academia by it and the increasing deterioration in both.  On one occasion, my claim that the designers of SQL did not really understand the relational model was dismissed as utter nonsense. But if that made me a crank, so were Ted Codd and Chris Date.

My friend Jim Lowden emailed me:
Last month [there] was a hopeless article All Your Database Are Belong to Us about DBMSs [in CACM] so badly mistaken, that it was too dreary even to reply. [The author] managed to get the Closed World Assumption pretty much backwards, if that's possible.

Wednesday, December 19, 2012

Site Update



1.
A new Quote of the Week was posted on the QUOTES page.

2.
A new To Laugh or Cry? item was posted on the LAUGH/CRY? page.

3.
A link to an exchange in which I participated was posted on the FP ONLINE page.

4.
I have added APPLIED MATHEMATICS FOR DATABASE PROFESSIONALS by Toon Koppelaars and Lex Haan to the recommended books widget on the home page.


Sunday, December 16, 2012

"Schema-less Models" and the New World Disorder



There is a tendency in the database field to distort, use poorly defined
terminology, or use it inconsistently.  As I already argued in previous posts, technologies that are founded or used without foundation knowledge and understanding will prove costly fads.

Wednesday, December 12, 2012

Site Update



1.
A new Quote of the Week was posted on the QUOTES page. Consider it in the context of my posts on database vs. application enforced integrity.

2.
A new To Laugh or Cry? item was posted on the LAUGH/CRY? page.


Wednesday, December 5, 2012

Site Update (UPDATED)



1.
A new Quote of the Week was posted on the QUOTES page.
I contend that in a certain sense the perspective is upside down and backwards. Can you figure out why?

2.
A new To Laugh or Cry? item was posted on the LAUGH/CRY? page.

3.
My latest Data Fundamentals column and a thread I participated in were posted on the ONLINE page.

4.
From Erwin Smout (via email):
In 2009, the ACM re-published Codd's paper Derivability, redundancy and consistency of relations stored in large data banks.

For searching purposes, the ACM "classifies" each published article using some "taxonomy" of their own making. Apparently, Codd's paper is classified under "Database Administration" and not under "Relational Database Model".
Says a lot, doesn't it?

UPDATE: I was wondering what ACM did classify under "relational model" (you can probably guess what I suspected). So I asked Erwin to check and here's what he found:

Saturday, December 1, 2012

Data Warehouses and the Logical-Physical Confusion



(Erwin Smout is co-author of this post.)

Revised 8/26/18

In Implementation Data Modeling Styles Martijn Evers writes:
"Business Intelligence specialists are often on the lookout for better way to solve their data modeling issues. This is especially true for Data warehouse initiatives where performance, flexibility and temporalization are primary concerns. They often wonder which approach to use, should it be Anchor Modeling, Data Vault, Dimensional or still Normalized (or NoSQL solutions, which we will not cover here)? These are modeling techniques focus around implementation considerations for Information system development. They are usually packed with an approach to design certain classes of information systems (like Data warehouses) or are being used in very specific OLTP system design. The techniques focus around physical design issues like performance and data model management sometimes together with logical/conceptual design issues like standardization, temporalization and inheritance/subtyping."

"Implementation Data Modeling techniques (also called physical data modeling techniques) come in a variety of forms. Their connection is a desire to pose modeling directives on the implemented data model to overcome several limitations of current SQL DBMSes. While they also might address logical/conceptual considerations, they should not be treated like a conceptual or logical data model. Their concern is implementation. Albeit often abstracted from specific SQL DBMS platforms they nonetheless need to concern themselves with implementation considerations on the main SQL platforms like Oracle and Microsoft SQL Server. These techniques can be thought of as a set of transformations from a more conceptual model (usually envisaged as an ER diagram on a certain 'logical/conceptual' level but see this post for more info on "logical" data models)."

Tuesday, November 27, 2012

Site Update



1.
A new 'To Laugh or Cry?' link was posted on the LAUGH/CRY? page.

2.
A new 'Quote of the Week' was posted on the QUOTES page.

3.
My latest blog post, Not All Structures Are Created Equal, at All Analytics.

4.
Want to get a sense of what a fad looks like? Check out
Big Data News Network. And if are familiar with my argument that instead of leading the industry with science, academia is following industry's fads by substituting vocational training for education, here's an excellent example: 

Announcing a New Master's Degree: Business Analytics 

The department of information systems and the supply chain management department have joined forces to launch an accelerated and specialized master's degree program in the fast growing field of business analytics. The Master of Science in Business Analytics program is full-time, and students will earn their MS in Business Analytics after one academic year. Classes will start in the fall of 2013, a



Saturday, November 17, 2012

Site Update



1.
A new Quote of the week was posted on the QUOTES page.

2.
A new To Laugh or Cry? item was posted on the LAUGH/CRY? page.

3.
It has come to this. It was only a matter of time and I dk what took so long.



Thursday, November 8, 2012

News



1.
Reminder: My presentation
NoSQL, No  Schema and Database Management: Data Fundamentals and Technology Evaluation
San Francisco SQL Server User Group, Wednesday, 11/14/12, 6:30pm
Free and open to the public. Details here.

2.
Quote of the Week was posted to the QUOTES page. Consider it in the context of the comments to my last post, Kinds of keys.

3.
A new To Laugh or Cry? item was posted to the LAUGH/CRY? page.

4.
A new online exchange I participate in was added to the FP ONLINE page.

5.
Here's an item that validates my arguments that (1) SQL or the relational model are not the real problems when it comes to data management distributivity (2) proponents of distributivity and the Cloud have no grasp of what distributivity really entails.

Why Google Went Offline Today and a Bit about How the Internet Works - CloudFlare blog


Thursday, November 1, 2012

Site News



1.
My 11/16 San Francisco presentation. Details here.

I am planning a European tour in the March-April time frame, which will include the UK and possibly Israel. Those interested in organizing seminars or presentations, or who know others that might be, please contact me.

I have added a SCHEDULE option to the site's top menu -- a page listing forthcoming presentations/seminars with links to the events' pages.

2.
I added an FP ONLINE option to the top menu -- a page listing my web columns and online discussions in which I participate.

3.
New Quote and To Laugh or Cry of the Week were posted today.

The QUOTES and LAUGH/CRY pages were also updated with all the posts to date.

From now on Site News will announce when these two pages are updated with new posts so that interested readers can check them out.

4. Matt Rogish drew my attention to an article on programming which is not inconsistent with my claim that the IT industry operates like the fashion industry:

Programming is a Pop Culture
Discussion


Wednesday, October 31, 2012

Weekly News



1.

I will give a presentation for the San Francisco SQL Server User Group:

Microsoft Office
835 Market Street
Suite 700
San Francisco, CA (map)

More details here

2. 
I am working on a European tour in the March-April time frame. UK will probably be included and possible Israel. Those interested in organizing seminars and/or presentations, or know somebody who might be, please contact me.

3.
Google claims it has resolved the problem with updates changing page URLs, so I have added all the quotes to date to the QUOTES page. LAUGH/CRY page is next.

4. It will take me a while to revise all my papers, so if you order the papers now you will get the already revised Business Modeling for Database Design and current versions of the other papers that will entitle you to revised versions when they are published.

5. There seems to be a problem with adding a LinkedIn share button to the blog, with LinkedIn pointing to Google and Google, as is its wont, couldn't care less. Until this is resolved--don't hold your breath-- please share manually. Thanks.

 

Monday, October 29, 2012

The Cloud and SQL



In Why SQL Is Not Suited for the Cloud and What Is NuoDB, one Aswin writes:
In the data world, SQL is a relation-based, data model, and the most common means used to retrieve and manipulate data. Predicates, clauses, expressions and queries that are all used in SQL data management system have made it abundantly popular and user friendly. But this traditional data management system suffers serious of limitations...

Friday, October 26, 2012

Weekly News



1.
My latest column Why Managers Need Business Models (my original title was Modeling and Usefulness).

2.
I am working on a European tour in the March-April time frame. UK will probably be included. Those interested in organizing seminars and/or presentations, or know somebody who might be, please contact me.

Have a nice weekend.


Wednesday, October 24, 2012

Education, Practicality and an Introductory SQL Book



WS: I have a question. I have been asked about the possibility of teaching a SQL course. My audience will be people from a scientific rather than an IT background (which I think will make my job easier). If the training does take place then obviously in my own material I will cover the basics of the relational model before moving on to SQL. One of them has asked me if there is a book I can recommend on the subject to get them started. Now although I own a lot of books about the relational model and about SQL they contain mainly quite advanced material, I don't know of an introductory book about SQL that is theoretically sound. I wonder if you can recommend one? ... I haven't seen anything that I am particularly enthusiastic about. I would say it is a gap in the market, though unfortunately the market probably demands books about Oracle, Sql Server or MySQL. That is, of course, true of trade media and book publishers too.

Sunday, October 21, 2012

To Laugh or Cry




tblIsThere, The Daily WTF


Quote of the Week



Until the rise of nosql solutions, you would force-fit all data and relations into traditional RDBMS model and then do a fire dance in the application layer to manage them. In other words, you would force a cube into a cylinder with enough force that it fits. With nosql solutions, that is not necessary, so you can pick the right solution based on the problem domain. If your data is largely schemaless, there is no point in forcing it into a relational table with most of the columns being null, so a document oriented DB such as MongoDB or CouchDB would be ideal. If you dealing with lists, sets, queues etc., which need to be persisted, then Redis maybe an ideal candidate. If you are managing key/values then Cassandra maybe your choice. Of course, you don't want to have a dozen of these in your system, but a careful use of RDBMS and nosql solutions would greatly simplify application design.
--Have any NoSQL databases become viable replacements for SQL databases yet?, quora.com

Thursday, October 11, 2012

Weekly News



1.
Reminder

My presentation: Foundation Knowledge for Database Professionals
Mountain View, October 16, 6:30pm (open to the public and free)

More details here.

2.
Three LinkedIn Data Modeling group threads in which I participated.

One or two files
Is there a 3NF to denormalized tool
Kinds of Data Models -- And How to Name Them

3.
To new visitors: If you are wondering why the SEMINARS, PAPERS, QUOTES and LAUGH/CRY pages have not been updated, there is a bug in Blogger--the URL changes every time a page is updated--which wreaks havoc with traffic/SEO. Google has not been able to fix this for months and until they do many bloggers do not update the pages. Until a fix is provided, for previous quotes and To Laugh or Cry? posts please use the Blogger search facility at the top of every page or just scroll back to older posts.

4.
Cloud computing: here we go again

Came across this article that is consistent with my arguments about how industry operates.
Summary: The same pattern has emerged again and again in the history of IT. New ideas emerge. Vendors develop their own approaches. Camps form to support those approaches. Eventually standards emerge. The battle over cloud computing standards and approaches is only the latest repetition of the pattern.
The Achiles heel in all this is that many if not most new ideas are not new at all. The only way to ensure that the wheel--and a square wheel at that-- is not constantly reinvented is to rely on sound foundations and knowledge of the history of the field. Otherwise, we're spinning wheels in place at best, or go into reverse.

5.
A comparison

Last week during Oracle Open World I had the pleasure of meeting briefly Jonathan Lewis (with Jonathan Gennick's help) and we had a chat about the state of the industry. In that context, please watch the two videos below and see if you discern a difference.

http://www.youtube.com/watch?v=g65fII0nX8g
http://www.youtube.com/watch?feature=endscreen&NR=1&v=FvRDK2B0T_I

Wednesday, October 10, 2012

To Laugh or Cry?



UPDATE: By mistake I posted a link to a piece that I intended to debunk, which is why you could not see any comments. Here is the correct one.



The Death of the Relational Database,
whydoeseverythingsuck.com

An oldie, but goodie. Particularly some of the comments.



Quote of the Week



SQL Server and all relational databases aren't much more than glorified file handlers. If you accept them as that, then you'll see the falacy [sic] of trying to do things like inheritence [sic] and the wad of other stuff people think it should do. SQL is not a full fledged language nor should it be. It's a simple file handling language. Tables are really nothing more than files with lines (rows) and fields (columns). It's a very simple thing... why do people keep trying to make it more complex?
--Relational Model Advantages,SQLServer Central

Sunday, October 7, 2012

Data Fundamentals, Fads and "Big Data"



WS writes:
It is fairly clear to me that the "NoSQL" movement (or "big data" as some people seem to be calling it nowadays) has thrown out the relational baby with the SQL bathwater.
That is exactly the point I was making in my several posts on NoSQL. Since SQL is practically the only commercially implemented data language with any relational characteristics, most practitioners confuse it with the relational model. I quoted with approval David McGoveran that the NoSQL movement has its roots in the anti-relational sentiment expressed by object and XML proponents. Underlying it is the failure to appreciate the distinction between the fads that characterize the industry's mode of operation and relational technology. This failre is due to disregard for and lack of knowledge of data fundamentals. In fact, the relational model has been treated for a time as just another fad.
In particular I wanted to say that I agree very strongly with Eric Kaun's comments about the supposed inflexibility of relational schemas. In my experience it is actually very easy to change the schema, but often fiendishly difficult to assess the impact of these changes on all the applications that use the database.
For a while I just could not understand what the claims of "fixed relational schema" meant, because it is obvious that the relational approach increased schema flexibility via several types of data independence relative to products that preceded it. But recently it occurred to me that what complainers mean by it is not that the schemas are fixed, but rather that inflexibility means exclusive reliance on tables. Why not change the structural basis of the schema "as needed"? This certainly is an explicit fundamental rejection of the relational model (they cannot express even their complaint correctly).
In spite of the apparent sophistication of many developer environments, none of them, as far as I am aware, do anything as simple as representing the relation between application language expressions and database attributes. This would mean a simple query could find all the dependencies between applications and the database. As is often the case application programmers are ignoring the failings of their own version and dependency management and blaming the DBMS instead.
It depends on what the meaning of "sophisticated" is. It may apply to programming languages, techniques and tools, but not necessarily to data fundamentals such as logic, the different data models with their pros and cons, their practical implications and the history of the field. That explains why old technologies are reinvented and relabeled as new and the constant stream of fads. As to the application-DBMS functional distinction, many developers are not even aware of it and see nothing wrong with having to undertake database functions in applications. Hence the increasing number of "application-specific databases".
Regarding NoSQL, I suspect there is an element of cargo-cult thinking going on here. Google have been very successful (at least from a business perspective) using distributed processing so if you use the same methods as Google then you are bound to be successful too.
Obviously if you lack foundation knowledge you will opt for emulating others without a real understanding to what extent their success--to the extent that it is not a short-lived one--is applicable to your circumstances (see SQL/PostSQL/NoSQL). As an example, from my experience with Google's Blogger in particular and their other products in general (see my previous post) I would not be surprised if their poor ability to fix problems and provide expedient support (if at all!) is probably due to some extent to their reliance on ad-hoc, schema-less products. These problems have the potential to reach the same walls as those reached by pre-Codd non-relational technologies. Facebook, Amazon and other "Big Data" companies reveal similar problems.

The size of the data has nothing to do with the data model. Distribution has nothing to do with the data model. Consider, again, David McGoveran's list of circumstances which may justify a NoSQL product:
First, when discovery of relationships is more important than consistent processing and specific data results.
Second, if the data processing is meant to be inductive (e.g., suggestive) rather than deductive (i.e., precise).
Third, when the application is changing very fast, data complexity is great (variety or amount).
Fourth, if physical issues, like big data or a high degree of parallelism, are more crucial than data integrity. You must be willing to throw away data consistency in favor of performance and scalability.
Fifth, if you have a mission-critical one-off application for which a fixed data organization is ideal, in which case the costs and risks may be lower than licensing a vendor’s RDBMS [read: SQL] or trying to force an open-source RDBMS [read: SQL] to fit the need.
With knowledge of data fundamentals one is tempted to conclude that the intention here was to demonstrate that the set of applicable circumstances is very close to being the empty set. In the absence of such knowledge, the temptation to use these tools in order to avoid hard thinking and design effort upfront, while expecting is just too strong to resist.

Thursday, October 4, 2012

Weekly News




It's been months since a problem was discovered in Blogger's new interface: pages' URLs change when they are updated, which is a disaster for SEO traffic. Amazingly, Google initially asked for details experienced by users, lots were provided, but there was no response from Google or any information that would indicate when a solution would be available, except a comment very recently that they are working on a solution that might be available within a couple of weeks. Worse, it switched users to the new interface before it provided a solution (see Page URLs Change, Though Link References Do Not Change), although it looks like the update introduced the problem to the old interface too (my gut tells me it might be a data management problem).

There's more. As I wrote earlier, I was planning to drop the old dbdebunk.com site and point the this blog to that domain, so that links to the old site would at least reach the blog, if not the original page. Google provides instructions how to use a custom domain with Blogger, so I canceled the old site, but when I followed the instructions, I got an error. Then I was alerted to yet another problem with Blogger:
Known Issue: Custom domains in Google Sites

We are aware of an issue with custom domains in Google Sites. The functionality to add a new custom domain is currently unavailable - however, existing custom domains are working. We are working to resolve the issue as quickly as possible and expect a fix within the next few weeks
It was posted on September 19th and note the timeframe of a solution!
That a company like Google cannot resolve such problems for months (and is unresponsive to boot) despite the damage it causes its users and so many complaints is a a phenomenon on the rise. Companies like Facebook and Google have, on the one hand, huge institutional market power and on the other hand hugely complex software infrastructures that are based on proprietary, ad-hoc software (all those "BigData", NoSQL or even non-database applications goodies) that are extremely difficult to manage, maintain, progress and optimize. This creates the combination of difficulty to satisfy user needs with lack of responsiveness.

Luckily, my domain registrar was able to provide a temporary solution by forwarding dbdebunk.com links to dbdebunk.blogspot.ca without users seeing the .ca.

To Laugh or Cry?



Database design and terminology, dbForums.com

Quote of the Week



If you have a wide existing table containing lots of frequently null columns, it is likely you can make it more efficient if you re-arrange the physical table so that most of the nulls are contiguous to the right hand side of the table. If the table is over 254 columns in certain RDBMSes, you may profit from breaking the table into two pieces such that the overflow is less often referenced and when you need the whole it is via a view slicing the two physical tables together one-for-one.
--Database tables and NULLs, stackexchange.com

Monday, October 1, 2012

Normalization, Further Normalization, Ease of Use, Integrity and Performance



Revised: 10/15/16
"Normalization was invented in the 70's as a way to put some structure around how developers were storing data in a database, in addition to trying to save disk space. You need to remember this was a time when 1MB was billions of dollars and a comput er needing 1GB of data storage was inconceivable. It was a way to squeeze as much data into as small a space as possible." --Tom Phillips, social.technet.microsoft.com
Perhaps the lack of understanding of the relational model was so acute at the time when it was first published (1969-70), that it would not surprise me if a belief existed then that normalization would save storage space, even if I don't understand in what this belief was grounded. But there is no justification for such a belief to persist in 2012, no matter what else one thinks of normalization.

For the multiple advantages from full normalization (5NF) -- chief among them semantic correctness of query results (i.e., no anomalous side-effects)--see the just published THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, available via the BOOKS page).

Thursday, September 27, 2012

A Note on Education vs. Training



Gene Wirchenko drew my attention to an Infoworld article by Andrew Oliver, Ill-informed haters go after MongoDB, which is kind of a response to the articles critical of MongoDB which I commented on in previous posts. The gist of the article is described as: "NoSQL databases like MongoDB are great for some tasks but not for others. Is it MongoDB's fault if misguided developers use it to solve the wrong problem?"
With any new technology comes a wave of marketing happy talk, which in turn leads to inexperienced developers "jumping on the train" of a new fad. Inevitably, these newbies find themselves disappointed that the technology doesn't deliver on their inflated expectations.
Oliver correctly identifies the core systemic problem in database management, one that I have been warning of for almost my entire 25+ career in the field: the lethal combination of proliferation of thoroughly hyped ad-hoc products and technologies by vendors unfamiliar with the foundation and history of the field to database professionals and users equally unfamiliar with same. Neither do I find fault with the advice he offers at the end of his article:
"Take blogs with a grain of salt ... make sure you understand the technology before using it on a critical project. If you don't heed this advice, some writer for Infoworld on a short deadline in a slow news week might decide to ridicule you!
although I don't think the ridicule by journalists, even less knowledgeable about what they cover, is the most serious consequence.

But he fails to make the connection between a major source of the problem and the effectiveness of his advice.

The IT industry in general and the database field in particular rely almost exclusively on tools experience. Practitioners are inducted in the field mainly via practice with specific tools that happen to be in vogue at specific times; job descriptions don't require much beyond that; and academia has been turned away from science and education into a research and certification vehicle for vendors and their tools, a trend which Dijkstra has attacked decades ago much better than I can. I experienced this personally on more than one occasion. To recall two:
  • When I offered a presentation on data fundamentals to a reputable computer science department, there was no interest, as they were too busy with "XML research".
  • When I tried to teach an introductory course in database management at a local university by developing a syllabus on data fundamentals, I was quickly disabused of that illusion by a demand to use a specific book and teach Oracle.
A lot is being made on the "high education bubble", the exploding cost of an academic education and the burdening indebtness caused by it. Among its many implications there is an insidious one. Enormous pressure is exerted, for obvious reasons, on academia to turn from educational to vocational: from employers and vendors (via various incentives that are hard to resist) and from students, particularly those sponsored by employers or vendors.

I do not believe that knowledge of and experience with tools alone is sufficient to address the problems underlying the database field. Without foundation knowledge, including the history of the field, relabeled old discarded products will continue to proliferate and practitioners will lack the capacity to avoid being seduced by hype.

Indeed, one could argue that the attraction of the so-called "schema-less" NoSQL products is due to the difficulty to think conceptually and logically about requirements and evaluate technologies and products critically because the necessary knowledge and ability to reason and abstract--distinct from tool experience--have not been inculcated. The commonly used assertion "different databases for different purposes" or "the right tool for the right task" are trivial and trite and can be misleading without the benefit of foundation knowledge external to the tools themselves.

Note very carefully that I do not mean to imply that tool experience is unimportant, which would be nonsense. Rather, I claim that it is necessary but insufficient for intelligent functioning in the database field, as it probably is in many other fields.

Wednesday, September 26, 2012

Mountain View Presentation



I will be giving a presentation to the Silicon Valley SQL Server User Group:

Foundation Knowledge for Database Professionals

October 16, 6:30pm

1065 La Avenida
Building 1
Mountain View, CA (map)

It is open to all. Please join us and invite anybody who might be interested.

More details here

Quote of the Week



I've read a few things about NoSQL (technology or movement? That is the question!). As my colleague Stuart McLachlan rephrased the question: If it's a movement, the real question is whether it's a religious or bowel movement.
--SQL/PostSQL/NoSQL, artfulopinions.blogspot.com

To Laugh or Cry?



To Laugh or Cry? items are specifically selected as lost causes (usually, the problems lie with either the author of the item, or with those who respond to it, or both). For that reason I normally do not comment on them. This week's piece is somewhat unusual in that the target is not the author, but rather a vendor. The following item was brought to my attention by Matt Rogish.

Diego Basch, I’ll Give MongoDB Another Try. In Ten Years.

Saturday, September 22, 2012

Davide and David on NoSQL



After publishing my 3-parter on NoSQL I came across a post by Davide Mauri and an interview with David McGoveran on the subject, to which I would like to add some clarifications (this is not a debunking, as they hardly justify such).

Mauri asks: "Do NoSQL people really want to drop the relational model?"

Thursday, September 20, 2012

PRACTICAL DATABASE FOUNDATIONS PAPER #1: BUSINESS MODELING FOR DATABASE DESIGN




Paper #1, revised and expanded, is available for purchase. Please follow the instructions on the PAPERS page to order.

About this Paper

The purpose of this paper is to introduce practitioners to data fundamentals and advance their understanding via business modeling for database design.
It focuses particularly on and distinguishes between the three types of model that are often confused: business model, logical model and data model.

The paper will delve into each one in detail, explain the basic modeling and design concepts, principles and methodology of formalizing an informal business model via the relational data model into a formal logical database model which can be computerized.

Preface

Introduction

Business Modeling
Basic Modeling Concepts
Business Rules
Property Rules
Class Rules

Attribute Rules

Distinctness Rules

Functional Dependency Rules

Referential Rules

Associative Entities

Optional Rules
Business Models
Database Design
Formalizing the Informal
Predicates and Propositions

The Relational Data Model
Relational Structure

Relational Manipulation

Relational Integrity
External Predicates

Integrity Constraints and Internal Predicates
Logical Models
Understanding Database Management
A Foundation Framework
Applying the Framework (Exercises)
Appendix A: Integrity Constraints Formulation and Verification

References

Wednesday, September 19, 2012

Data Models and Usefulness



In my 3-part article on dynamic schema, NoSQL and the relational model I admitted that I did not know much about NoSQL products in general and MongoDB in particular. Nevertheless, it was not difficult to figure out what problems would be faced using a docubase for database management purposes.

Then Matt Rogish alerted me to Why I Migrated Away From MongoDB. I suggest you read it in the context of my article and see how close my suspicions were to reality. Here are some quotes worth pondering (emphasis mine):
Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>. 
People keep complaining that JOINs make your data hard to scale. Well, the converse is also true - Not having JOINs makes your data an intractable lump of mud. 
...when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case [FP: heh, heh]... And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.
...somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost.
Then it might have been the lack of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.
Anybody with foundation knowledge would expect these problems. As I argued so many times: a data structure determines manipulation and, therefore, usefulness for a given informational purpose.

Oh, and in this context, see my AllAnalytics posts and the exchanges with readers: 

Knowing What a Database Is
Unstructured Data

Tuesday, September 18, 2012

Object Orientation, Logic and Database Management



WS writes:
On the subject of object orientation I often feel inclined to quote Leslie Lamport's comment in the introduction to his book "Specifying Systems": "If exposure to C++ hasn't completely destroyed your ability to think logically, you should have no trouble filling in the gaps in your mathematics education".
I tried for many years to understand OO, but without success. It didn't occur to me until I encountered the material on dbdebunk that the problem might not lie with me. OO's lack of a formal definition leads, I think inevitably, to what I have come to call PALC (Pointless Additional Layers of Complexity). Every addition to OO leads to new problems that require ever more complicated solutions, that in turn create a whole new set of problems.

Monday, September 17, 2012

To Laugh or Cry?




Is it really necessary to normalize your databases in third normal form?, social.technet.microsoft.com

Quote of the Week



The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.

2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:

    Relational:
        Was designed to do relational stuff* from the get-go, even if it now does other things too.
        Supports a lot of SQL.
    Non-relational:
        Was designed primarily to do non-relational things.*
        Doesn’t support all that much SQL.

*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
--Curt Monash

Weekly DBDebunk News



1. I have added
  • a "Link to this blog" button and code at the top right
  • URL, HTML and BB/Forum link codes at the end of each post
If you like the blog or posts, please use them to link to them. This being a new blog, it will help with visibility. Thanks.

2. Shortly I will switch the blog to the old domain, dbdebunk.com. You don't need to do anything as the blogspot will simply forward you to that domain.

3. For those interested in databases and gaming, there is a forum thread at

https://forums.eveonline.com/default.aspx?g=posts&m=1919633

about problems with EDK, an older 'massive multiplayer online' (MMO) game based on mySQL, the latter supposedly being broken by the application/user load. The discussion is about the new DBMS to be considered and the choice between SQL and NoSQL (MongoDB about which I wrote in my three-part article, is mentioned).

I do not presume to know the technicalities and terminology of gaming systems and I am not in a position to judge whether current SQL systems can adequately satisfy gaming in general and EDK in particular. Perhaps readers who have knowledge in this area can enlighten us.

But if you read the exchange carefully you can detect the familiar sources of the (misleading) attraction of NoSQL: lack of familiarity with history and foundation knowledge, developers not transcending their application view of databases, the logical-physical confusion, the failure by SQL and its implementations to adhere to the relational model and so on. There is one developer with foundation knowledge, though, who is readily discernible.

We are regressing decades. I may not be around when somebody will have to "discover" the relational model in order to address the very same problems that Codd  thought he did 40+ years ago.

4. Matt Rogish has submitted this link: 

Why I Migrated Away From MongoDB
which I may comment on later.

Friday, September 7, 2012

Forward to the Past: "Out-clevering" the DBMS



When I see a title like "Software Engineering: What are some clever programming techniques that aren't used often enough?" my antennae come up. More often than not (1) programmers consider certain things clever only because they are not familiar with history (2) in its quest for solving problems engineering sometimes tries to out-clever science, which usually spells trouble.

Wednesday, September 5, 2012

Quote of the Week



Relational databases ... had poisoned whole generation of programmers thinking (incl. me). With no clue we approached everything through relational prism that we studied in university. MySQL was the only OSS alternative that cut enough corners to make relational DBMS fit the problems we tried to solve. For which the relational paradigm wasn't a good fit anyways.

So. then comes along the NoSQL stuff and you find out that you have much better tools than relational for these kinds of problems (simple websites, blogging sites, feed sites, social sites, etc.). Along the way you might see all the other problems that really need realational (ERP systems etc.). But now you need real relational not some half-baked all corners cut MySQL stuff. Yeah that also goes for InnoDB which might have become something if Oracle hadn't freezed it.

Then you try out Postgres (because it's free) and when it works for big workloads (yes it does in skilled hands) then you are hooked. If by chance you then check out Oracle ... it feels like a twelve ton dinosaur ("erm. no boolean type? whaat - implicit commit with DDL?"). If by any miracle you don't have the 1% of projects that need something special - Oracle RAC or Spatial or whatnot. Then Postgre is the most mainstream option available - there just is no other as popular, as functional and as cheap solution for these kinds of problem domains that really require relational DBMS.

PS: been using MySQL (both MyISAM/InnoDB) over 10 years, PostgreSQL over 5 years and Oracle over 5 years. --Hipsters Hacking on PosgreSQL, theRegister.com

To Laugh or Cry?




We don't Need No Stinkin' Database, The Daily WTF

Friday, August 31, 2012

Weekly DBDebunk News



1.
As most of my readers know, for various reasons I stay away from products and will occasionally touch on them only in a very specific context of data and relational fundamentals, in both the negative (e.g. false claims, violations, or errors) and positive (true and correct implementations) sense. For such purposes one needs the very rare combination of thorough product experience in actual practice and knowledge, understanding and appreciation of fundamentals. I happen to know a few such practitioners and I've asked them for input when necessary and they kindly agreed. So in the future expect to see some of their contributions.

Recently somebody requested in a LinkedIn database related group advice on how to select a DBMS. He received a few responses which included some important factors such as cost, support, development community and so on, but all were practically devoid of critical factors such as technical properties and, as one of my advisors pointed out, suitability of those to business needs. As another of my advisors correctly pointed out, this may have been OK several years ago, but not now, when the BigData/NoSQL fad is in full bloom and the circumstances seem to have regressed to the good old pre-relational days of IMS and CODASYL.

I have asked some of my advisors to offer their thoughts on DBMS selection and I am getting some interesting replies which provide evidence for the validity of my gut sense about these "DBMSs", which I will use in the appropriate contexts. It does look, though, like even with my limited knowledge of NoSQL products, the gist of my impression about them seem to have been spot on: "we don't know much about the data and what we logically need to do with it, except that it's a lot of it and we want to store it physically in such ways that will maximize the performance of those operations that we hope to discover we need."

Stay tuned.


2.
I have added a TRDBMS links section (bottom right). It currently includes Dataphor and Rel, both are implementations of Date & Darwen's Tutorial D (which is intended to be a full development language with intergrated relational capabilities). The former is an industrial strength open source TRDBMS used by the Database Consulting Group. The latter is currently an educational vehicle that is being developed into a product. Both are free, so you can get a feel of what for what a truly relational product looks like/can do relative to SQL and the new wave of products claiming to be improvements over it.

Have a good weekend.

Wednesday, August 29, 2012

To Laugh or Cry?



Help designing database table, dbForums

Quote of the Week



Q: normalization of receipt help pls

   DESCRIPTION|QTY|UNIT COST|TOTAL|TRANSACTION TYPE

   i need to convert it to 1st NF, 2nd NF , 3rd NF and BCNF
   pls help i already did a lot of searching in the net. i really cant seem to understand how normalization works.

A: (Expert): Look for E. F. Codd book.

--forums.databasejournal.com

Sunday, August 26, 2012

Schema, NoSQL and the Relational Model Part 3



I started in Part 1 with Haberman's justifiable doubts about "schema-less" NoSQL databases, using the MongoDB docubase as an example; and an exchange on the problematics of a "document data model" with Matt Rogish. In Part 2 I completed the exchange using the example of the W3C effort to define such a model for XML. I am now returning to schema and MongoDB DocBMS.

Thursday, August 23, 2012

Weekly DBDebunk News



1.
The last Laugh or Cry? item was a thread on NULLs at StackExchange.com. A reader commented that he could not tell whether I am "mocking (1) the question (2) the current answers or (3) the StackExchange forum" in the thread. So perhaps I should explain: Laugh or Cry? items are too taxing to debunk and pretty hopeless. But these items are not just entertainment (or pain, as it is for some of us), but also opportunities to test yourself on foundation knowledge. Lack of clarity on what is wrong indicates some gap that needs filling.

Now, of course, not everything is wrong in long threads, but it is precisely the ability to distinguish between sensible pronouncements and nonsense that foundation knowledge confers.

In fact, a thread I debunked at the old site is an excellent illustration of what I mean. Stay tuned.

UPDATE: One more reason for old posts: despite all the claims of "progress", some are evidence to the contrary: that fundamental problems are not resolved and that there is even regress in that progress.

2.
I last revised the papers in January 2011. The current revisions are major expansions/overhauls which I may well renumber version 1. The first paper I announced earlier, Business Modeling for Database Design, required a bit
more effort than I initially thought, so I have delayed the publication a bit for further refinements--it'll be worth the wait.

I am also testing the waters of self-publishing on Kindle --no small hassle--with the kind help of Charlie Clark (thanks, Charlie).

The seminars are undergoing a similar overhaul. I highly recommend them ( and I am not biased! :)). Go ahead, organize one.

3.
If you noticed, I am interleaving some posts from the old site with new ones. This is not just laziness or to fill space. I am soon going to drop the old site and I am selecting very carefully those that are too valuable to lose.

Besides, their revision does not take much less time and effort than writing new ones.

I have now added the original post date and dated the revised section to all of them, for proper context (thanks, Yiorgos, for suggesting it).

Have a nice weekend.

Wednesday, August 22, 2012

To Laugh or Cry?



Representative Table,The Daily WTF

Quote of the Week



The First Normal Form: Stipulates that no repeating groups or sets should be stored within a table. These similar groups should be stored in a separate table. To identify each unique column of these tables, a primary key should be established

The Second Normal Form: Data Redundancy should be non-existent. Foreign key constraints should be enforced.

The Third Normal Form: Every column in the table should be related and dependant on the primary key of the table. If this is not possible, the field should be stored in a new table with a new key.

The Fourth Normal Form: This one would probably exists somewhere in dream land - the elimination of independent relationships in a Relational Database.

The Fifth Normal Form: Exists in never-never land - it calls for a perfect segregation of logically related  many-to-many relationships.

So now you know something about relationships (that's what our whole Relational Database thing is about right?). But just keep in mind that as we increase and tighten our relationship enforcements, there would be a little trade off with performance.--George Alexander, A fundamental approach to Database design

Sunday, August 19, 2012

SQL Sins




Ovid: I've recently purchased the book DATABASE IN DEPTH by C. J. Date and I've been enjoying it very much.  It's been quite an eye-opener to discover that most of my gripes about "databases" were actually related to SQL.

To that end, I've accidentally started a lively discussion about the "merits" of SQL at Why SQL Sucks (with a little Perl to fix it). dbdebunk.com is getting mentioned quite a bit (and not always in a favorable light, I might add).  (Mine is the root post in the thread). Admittedly, I'm just learning about many of the issues involved. So despite years of working with databases, I may have some of the basics off a bit.

The reason I mention this is because that site is very high profile amongst Perl programmers and convincing the people at that site about the issues with SQL would go a long way to reaching a broad swath of the tech world. 


Friday, August 17, 2012

Weekly DBDebunk News



1. My first post for my new Data Fundamentals column at TechWeb's AllAnalytics:

Knowing What a Database Is

2. Last week I reported some Blogger bugs. One of them changes the URL's of the static pages when they are updated. The Seminars, Papers, LaughCry, Quotes and About are static pages, which means that I won't update them until Google fixes it, although I will continue to post updates in the blog.
If such an update scrolls before you read it you will have to go back to it to read it. I apologize for the inconvenience.

I am keeping a list of the pertinent posts and will update the pages as soon as the problem is solved.

3. Oops, forgot. Some of you may be experiencing broken links. I have a bunch of posts in draft mode in the pipe and and at one point I mistakenly posted all of them and had to take them off. Some of you are accessing those dead links. But not to worry: all of them will be posted in time, although they may have different URL.Again, sorry for the inconvenience, chuck it to birth pains.

Have a nice weekend.

Wednesday, August 15, 2012

To Laugh or Cry?



Database table and NULLs, stackexchange.com

Quote of the Week



In the database world, the raw physical data model is at the center of the universe, and queries freely assume intimate details of the data representation (indexes, statistics, metadata). This closed-world assumption and the resulting lack of abstraction have the pleasant effect of allowing the data to outlive the application. On the other hand, this makes it hard to evolve the underlying model independently from the queries over the model.

As the move to the cloud puts pressure on the closed-world assumption of the database, exposing naked data and relying on declarative magic becomes a liability rather than an asset. In the cloud, the roles are reversed, and objects should hide their private data representation, exposing it only via well-defined behavioral interfaces...

The world of database models is noun-based, talking about Customers, Orders, LineItems, etc. Once modelers have designed the data model correctly, they consider their job done.

In the realm of modelers, there is no notion of data abstraction that separates abstract properties of the model from the concrete details of the fully normalized realization in terms of tables with PK/FK (primary-key/foreign-key) relationships. --All Your Database Are Belong to Us, Erik Meijer

Schema, NoSQL and the Relational Model Part 2



Part 1 ended with my following comment to Matt Rogish, on the subject of a document data model:
As Codd realized, to do database management you must have some data model, period! You cannot do it without one. Indeed, a schema is based on it.

So in order to design a document database system of the kind you envision you must first define a document data model: structure, manipulation and integrity. What exactly is it?
Part 2 continues my response to Matt, using an exchange between me and Hugh Darwen to illustrate  what happened when an attempt was made by a W3C committee to come up with an XML document data model.

Friday, August 10, 2012

Weekly DBDebunk News (UPDATED)



Welcome and thanks to readers, subscribers, well wishers and all expressions of interest and support.

1. Except for some Blogger bugs, the blog format seems to be working better than the old site, so at some point in the next few weeks I will drop the latter and probably shift the blog to the old domain. As far as I can tell, the current blogspot URL will continue to work. I will keep you posted.

2. Currently my top priority is to revise and expand my papers, seminars and to add to them. I am also experimenting with self-publishing on Kindle, which proves to be a hassle. This will keep me busy for a while, so if I  am not as responsive/active as I would like to be, thanks for your patience.

3. As a result of comments to two of my posts that identified some problems, I have taken them offline for revision and I will repost them as soon as I can.
Looks like I am a bit rusty after so many years in the wilderness. I apologize for the inconvenience. (UPDATE: My revised first post has been republished).

Please feel free to contact me at the address on the About page with submissions for debunking, questions, comments or whateve.

It's good to be back.
FP

Sunday, August 5, 2012

Schema, NoSQL and the Relational Model, Part 1



In What's Wrong with the Schema? Stephen Haberman is
...wondering why schema-less databases are so popular lately. Most any NoSQL store is schema-less. And while perhaps schema-less-ness is an integral part of NoSQL (e.g. most NoSQL databases are just opaque key/value stores), I would assert it’s an orthogonal concern, and that document-oriented databases, e.g. MongoDB, could arguably have a schema. However, MongoDB doesn’t just say it’s schema-less for technical reasons related to being a NoSQL store, it actually touts its lack of a schema as a benefit, claiming it is “agile” and offers “simplicity and power”. I find all of this confusing, as I actually want a schema.

Friday, August 3, 2012

To Laugh or Cry (UPDATED)



Readers of my old site may remember the "To Laugh or Cry?" feature. It was a series of links to material to which it was difficult to decide which of the two reactions would be justified.

Note: Weekly Quotes and To Laugh or Cry? items are archived on two pages accessible from the top menu, so that they can be perused after the posts scroll.  

Here is the first link initiating the series here.  

Comment On SQL Split, The Daily WTF

Wednesday, August 1, 2012

Quote of the Week



The relational calculus is good in describing sets. But it´s bad at describing relations between data in different sets. Explicit identities (primary keys) need to be introduced and normalization is needed to avoid update inconsistencies due to duplication of data.

To say it somewhat bluntly: The problem with the relational calculus and RDBMS etc. is the focus on data. It´s seems to be so important to store the data, that connecting the data moves to the background.

That might be close to how we store filled in paper forms. But it´s so unlike how the mind works.

There is no data stored in your brain. If you look at the fridge in your kitchen, there is no tiny fridge created in your brain so you can take the memory of your fridge with you, when you leave your kitchen. --Ralf's Sudelbücher, Musings on relations - or: WinFS is not enough

Re-write




See: Physical Independence Part 1: Don't Mix Model with Implementation



View My Stats