Wednesday, October 31, 2012

Weekly News


I will give a presentation for the San Francisco SQL Server User Group:

Microsoft Office
835 Market Street
Suite 700
San Francisco, CA (map)

More details here

I am working on a European tour in the March-April time frame. UK will probably be included and possible Israel. Those interested in organizing seminars and/or presentations, or know somebody who might be, please contact me.

Google claims it has resolved the problem with updates changing page URLs, so I have added all the quotes to date to the QUOTES page. LAUGH/CRY page is next.

4. It will take me a while to revise all my papers, so if you order the papers now you will get the already revised Business Modeling for Database Design and current versions of the other papers that will entitle you to revised versions when they are published.

5. There seems to be a problem with adding a LinkedIn share button to the blog, with LinkedIn pointing to Google and Google, as is its wont, couldn't care less. Until this is resolved--don't hold your breath-- please share manually. Thanks.


Monday, October 29, 2012

The Cloud and SQL

In Why SQL Is Not Suited for the Cloud and What Is NuoDB, one Aswin writes:
In the data world, SQL is a relation-based, data model, and the most common means used to retrieve and manipulate data. Predicates, clauses, expressions and queries that are all used in SQL data management system have made it abundantly popular and user friendly. But this traditional data management system suffers serious of limitations...

Friday, October 26, 2012

Weekly News

My latest column Why Managers Need Business Models (my original title was Modeling and Usefulness).

I am working on a European tour in the March-April time frame. UK will probably be included. Those interested in organizing seminars and/or presentations, or know somebody who might be, please contact me.

Have a nice weekend.

Wednesday, October 24, 2012

Education, Practicality and an Introductory SQL Book

WS: I have a question. I have been asked about the possibility of teaching a SQL course. My audience will be people from a scientific rather than an IT background (which I think will make my job easier). If the training does take place then obviously in my own material I will cover the basics of the relational model before moving on to SQL. One of them has asked me if there is a book I can recommend on the subject to get them started. Now although I own a lot of books about the relational model and about SQL they contain mainly quite advanced material, I don't know of an introductory book about SQL that is theoretically sound. I wonder if you can recommend one? ... I haven't seen anything that I am particularly enthusiastic about. I would say it is a gap in the market, though unfortunately the market probably demands books about Oracle, Sql Server or MySQL. That is, of course, true of trade media and book publishers too.

Sunday, October 21, 2012

To Laugh or Cry

tblIsThere, The Daily WTF

Quote of the Week

Until the rise of nosql solutions, you would force-fit all data and relations into traditional RDBMS model and then do a fire dance in the application layer to manage them. In other words, you would force a cube into a cylinder with enough force that it fits. With nosql solutions, that is not necessary, so you can pick the right solution based on the problem domain. If your data is largely schemaless, there is no point in forcing it into a relational table with most of the columns being null, so a document oriented DB such as MongoDB or CouchDB would be ideal. If you dealing with lists, sets, queues etc., which need to be persisted, then Redis maybe an ideal candidate. If you are managing key/values then Cassandra maybe your choice. Of course, you don't want to have a dozen of these in your system, but a careful use of RDBMS and nosql solutions would greatly simplify application design.
--Have any NoSQL databases become viable replacements for SQL databases yet?,

Thursday, October 11, 2012

Weekly News


My presentation: Foundation Knowledge for Database Professionals
Mountain View, October 16, 6:30pm (open to the public and free)

More details here.

Three LinkedIn Data Modeling group threads in which I participated.

One or two files
Is there a 3NF to denormalized tool
Kinds of Data Models -- And How to Name Them

To new visitors: If you are wondering why the SEMINARS, PAPERS, QUOTES and LAUGH/CRY pages have not been updated, there is a bug in Blogger--the URL changes every time a page is updated--which wreaks havoc with traffic/SEO. Google has not been able to fix this for months and until they do many bloggers do not update the pages. Until a fix is provided, for previous quotes and To Laugh or Cry? posts please use the Blogger search facility at the top of every page or just scroll back to older posts.

Cloud computing: here we go again

Came across this article that is consistent with my arguments about how industry operates.
Summary: The same pattern has emerged again and again in the history of IT. New ideas emerge. Vendors develop their own approaches. Camps form to support those approaches. Eventually standards emerge. The battle over cloud computing standards and approaches is only the latest repetition of the pattern.
The Achiles heel in all this is that many if not most new ideas are not new at all. The only way to ensure that the wheel--and a square wheel at that-- is not constantly reinvented is to rely on sound foundations and knowledge of the history of the field. Otherwise, we're spinning wheels in place at best, or go into reverse.

A comparison

Last week during Oracle Open World I had the pleasure of meeting briefly Jonathan Lewis (with Jonathan Gennick's help) and we had a chat about the state of the industry. In that context, please watch the two videos below and see if you discern a difference.

Wednesday, October 10, 2012

To Laugh or Cry?

UPDATE: By mistake I posted a link to a piece that I intended to debunk, which is why you could not see any comments. Here is the correct one.

The Death of the Relational Database,

An oldie, but goodie. Particularly some of the comments.

Quote of the Week

SQL Server and all relational databases aren't much more than glorified file handlers. If you accept them as that, then you'll see the falacy [sic] of trying to do things like inheritence [sic] and the wad of other stuff people think it should do. SQL is not a full fledged language nor should it be. It's a simple file handling language. Tables are really nothing more than files with lines (rows) and fields (columns). It's a very simple thing... why do people keep trying to make it more complex?
--Relational Model Advantages,SQLServer Central

Sunday, October 7, 2012

Data Fundamentals, Fads and "Big Data"

WS writes:
It is fairly clear to me that the "NoSQL" movement (or "big data" as some people seem to be calling it nowadays) has thrown out the relational baby with the SQL bathwater.
That is exactly the point I was making in my several posts on NoSQL. Since SQL is practically the only commercially implemented data language with any relational characteristics, most practitioners confuse it with the relational model. I quoted with approval David McGoveran that the NoSQL movement has its roots in the anti-relational sentiment expressed by object and XML proponents. Underlying it is the failure to appreciate the distinction between the fads that characterize the industry's mode of operation and relational technology. This failre is due to disregard for and lack of knowledge of data fundamentals. In fact, the relational model has been treated for a time as just another fad.
In particular I wanted to say that I agree very strongly with Eric Kaun's comments about the supposed inflexibility of relational schemas. In my experience it is actually very easy to change the schema, but often fiendishly difficult to assess the impact of these changes on all the applications that use the database.
For a while I just could not understand what the claims of "fixed relational schema" meant, because it is obvious that the relational approach increased schema flexibility via several types of data independence relative to products that preceded it. But recently it occurred to me that what complainers mean by it is not that the schemas are fixed, but rather that inflexibility means exclusive reliance on tables. Why not change the structural basis of the schema "as needed"? This certainly is an explicit fundamental rejection of the relational model (they cannot express even their complaint correctly).
In spite of the apparent sophistication of many developer environments, none of them, as far as I am aware, do anything as simple as representing the relation between application language expressions and database attributes. This would mean a simple query could find all the dependencies between applications and the database. As is often the case application programmers are ignoring the failings of their own version and dependency management and blaming the DBMS instead.
It depends on what the meaning of "sophisticated" is. It may apply to programming languages, techniques and tools, but not necessarily to data fundamentals such as logic, the different data models with their pros and cons, their practical implications and the history of the field. That explains why old technologies are reinvented and relabeled as new and the constant stream of fads. As to the application-DBMS functional distinction, many developers are not even aware of it and see nothing wrong with having to undertake database functions in applications. Hence the increasing number of "application-specific databases".
Regarding NoSQL, I suspect there is an element of cargo-cult thinking going on here. Google have been very successful (at least from a business perspective) using distributed processing so if you use the same methods as Google then you are bound to be successful too.
Obviously if you lack foundation knowledge you will opt for emulating others without a real understanding to what extent their success--to the extent that it is not a short-lived one--is applicable to your circumstances (see SQL/PostSQL/NoSQL). As an example, from my experience with Google's Blogger in particular and their other products in general (see my previous post) I would not be surprised if their poor ability to fix problems and provide expedient support (if at all!) is probably due to some extent to their reliance on ad-hoc, schema-less products. These problems have the potential to reach the same walls as those reached by pre-Codd non-relational technologies. Facebook, Amazon and other "Big Data" companies reveal similar problems.

The size of the data has nothing to do with the data model. Distribution has nothing to do with the data model. Consider, again, David McGoveran's list of circumstances which may justify a NoSQL product:
First, when discovery of relationships is more important than consistent processing and specific data results.
Second, if the data processing is meant to be inductive (e.g., suggestive) rather than deductive (i.e., precise).
Third, when the application is changing very fast, data complexity is great (variety or amount).
Fourth, if physical issues, like big data or a high degree of parallelism, are more crucial than data integrity. You must be willing to throw away data consistency in favor of performance and scalability.
Fifth, if you have a mission-critical one-off application for which a fixed data organization is ideal, in which case the costs and risks may be lower than licensing a vendor’s RDBMS [read: SQL] or trying to force an open-source RDBMS [read: SQL] to fit the need.
With knowledge of data fundamentals one is tempted to conclude that the intention here was to demonstrate that the set of applicable circumstances is very close to being the empty set. In the absence of such knowledge, the temptation to use these tools in order to avoid hard thinking and design effort upfront, while expecting is just too strong to resist.

Thursday, October 4, 2012

Weekly News

It's been months since a problem was discovered in Blogger's new interface: pages' URLs change when they are updated, which is a disaster for SEO traffic. Amazingly, Google initially asked for details experienced by users, lots were provided, but there was no response from Google or any information that would indicate when a solution would be available, except a comment very recently that they are working on a solution that might be available within a couple of weeks. Worse, it switched users to the new interface before it provided a solution (see Page URLs Change, Though Link References Do Not Change), although it looks like the update introduced the problem to the old interface too (my gut tells me it might be a data management problem).

There's more. As I wrote earlier, I was planning to drop the old site and point the this blog to that domain, so that links to the old site would at least reach the blog, if not the original page. Google provides instructions how to use a custom domain with Blogger, so I canceled the old site, but when I followed the instructions, I got an error. Then I was alerted to yet another problem with Blogger:
Known Issue: Custom domains in Google Sites

We are aware of an issue with custom domains in Google Sites. The functionality to add a new custom domain is currently unavailable - however, existing custom domains are working. We are working to resolve the issue as quickly as possible and expect a fix within the next few weeks
It was posted on September 19th and note the timeframe of a solution!
That a company like Google cannot resolve such problems for months (and is unresponsive to boot) despite the damage it causes its users and so many complaints is a a phenomenon on the rise. Companies like Facebook and Google have, on the one hand, huge institutional market power and on the other hand hugely complex software infrastructures that are based on proprietary, ad-hoc software (all those "BigData", NoSQL or even non-database applications goodies) that are extremely difficult to manage, maintain, progress and optimize. This creates the combination of difficulty to satisfy user needs with lack of responsiveness.

Luckily, my domain registrar was able to provide a temporary solution by forwarding links to without users seeing the .ca.

To Laugh or Cry?

Database design and terminology,

Quote of the Week

If you have a wide existing table containing lots of frequently null columns, it is likely you can make it more efficient if you re-arrange the physical table so that most of the nulls are contiguous to the right hand side of the table. If the table is over 254 columns in certain RDBMSes, you may profit from breaking the table into two pieces such that the overflow is less often referenced and when you need the whole it is via a view slicing the two physical tables together one-for-one.
--Database tables and NULLs,

Monday, October 1, 2012

Normalization, Further Normalization, Ease of Use, Integrity and Performance

Revised: 10/15/16
"Normalization was invented in the 70's as a way to put some structure around how developers were storing data in a database, in addition to trying to save disk space. You need to remember this was a time when 1MB was billions of dollars and a comput er needing 1GB of data storage was inconceivable. It was a way to squeeze as much data into as small a space as possible." --Tom Phillips,
Perhaps the lack of understanding of the relational model was so acute at the time when it was first published (1969-70), that it would not surprise me if a belief existed then that normalization would save storage space, even if I don't understand in what this belief was grounded. But there is no justification for such a belief to persist in 2012, no matter what else one thinks of normalization.

For the multiple advantages from full normalization (5NF) -- chief among them semantic correctness of query results (i.e., no anomalous side-effects)--see the just published THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, available via the BOOKS page).
View My Stats