Saturday, April 29, 2017

This Week

1. Database Truth of the Week

“… logic—is an analytical theory of the art of reasoning whose goal is to systematize and codify principles of valid reasoning. It has emerged from a study of the use of language in argument and persuasion and it is based on the identification and examination of those parts of language which are essential for these purposes. It is formal in the sense that it lacks reference to meaning. Thereby, it achieves versatility: it may be used to judge the correctness of a chain of reasoning (in particular, a “mathematical proof”) solely on the basis of the form (and not the content) of the sequence of statements, which make up the chain.” --R. R. Stoll

Monday, April 17, 2017

Don't Mix Model with Implementation

Here's what's wrong with last week's database picture, namely:
"When you design your database tables there are some important things to think of:
- Normalize to remove redundant data
- Use the smallest datatype possible
- Create as few indexes as possible, but not too few
- Avoid redundant indexes
- Every table must have clustered index
...
This is important in a normal database but it is even more important in SQL Azure because you have limited space for every database, your connections may be terminated due to heavy use of resources, you pay for what you use and the data that you transfer. You can use the SQL Azure management portal do design your tables or write the T-SQL statement yourself. The syntax to create a table in SQL azure is the same as in SQL server 2005/2008, but not all options are supported in SQL Azure.
...
CREATE TABLE [dbo].[table1]
 ([id] [int] IDENTITY(1,1) NOT NULL,
  [column1] [NVARCHAR](50) NOT NULL,
  [column2] [NVARCHAR](15) NOT NULL,
  [column3] [TINYNT] NULL,
  CONSTRAINT [pk_table1] PRIMARY KEY CLUSTERED ([id] ASC )
   WITH (-- PAD_INDEX = OFF,
            STATISTICS_NORECOMPUTE = OFF,
            IGNORE_DUP_KEY = OFF,
         -- FILLFACTOR=80,
         -- ALLOW_ROW_LOCKS = ON,
         -- ALLOW_PAGE_LOCKS = ON,
         -- DATA_COMPRESSION=PAGE)"
--HÃ¥kan Winther, A SQL Azure tip a day – Create a table
Read it all.

Saturday, April 15, 2017

This Week

Database Truth of the Week

"... systems of operations on data are most effective when they are formalisms, in which semantic considerations are unimportant until the formalism is applied to some specific application. In this way, database processing can join the ranks of successful mathematical abstractions. Differential equations, for instance, can be applied to situations ranging from orbit calculations to the quantum mechanics of the atom. The semantics of each application is unique to that application, but the formalism of differential equations is common. The power of the formalism lies in its abstraction from issues of meaning." --H. T. Merrett, Extending the Relational Algebra to Capture Less Meaning

Thursday, April 6, 2017

Understanding Kinds of Keys

My March  Post @All Analytics.

According to search queries hitting @dbdebunk.com, too many data professionals question the mandatory nature of primary keys, ask about changes to them, or prefer surrogate to natural keys. This indicates misunderstanding and misuse of a critical feature that can wreak havoc with inferences made from databases, including analytics. I have explained one function of keys, but there are several types of key that are poorly understood.

Read it all (and please comment there, not here




Saturday, April 1, 2017

"NULL Value" is a Contradiction in Terms

There is nothing wrong with Hugo Kornelis' picture of SQL NULL in NULL: The database's black hole. In fact, I recommend the series of which it is one part. It's the SQL's picture of how to treat missing data that's wrong.
"Let’s first take a look at what NULL is supposed to be. Here is the definition of NULL from the SQL-2003 standard: null value--A special value that is used to indicate the absence of any data value."
While the absence of a value may be represented by some value at the implementation level, I strongly recommend users not think of NULL as any kind of value at the model level. The problems with NULL stem precisely from the fact that it is not a value, but rather a marker for the absence of a value. NULL value is a contradiction in terms that distracts from the problems.

Sunday, March 26, 2017

This Week

1. What's wrong with this picture?

"Things get more complex when NULLable columns are used in expressions and predicates. In a procedural language, this wouldn’t have been a problem--if a procedural program fails to find the information it needs, it enters a conditional branch to handle this situation, as defined by the programmer. In a declarative, set-based language such as SQL, this was not possible. The alternatives were either to have the SQL developer add conditional expressions for each nullable column in a query to handle missing data, or to define a decent default behavior in SQL for missing data so that developers only have to write explicit conditional expressions if they need to override the default behavior." Hugo Kornelis, NULL - The database's black hole

(Nothing wrong with Hugo's picture--in fact, I highly recommend the series of which the source of this quote is one part--only with SQL's picture of relational treatment of missing data).

Sunday, March 19, 2017

New Paper: The Interpretation and Representation of Database Relations

The data management field cannot and will not progress without educated and informed users. Recently I announced UNDERSTANDING THE REAL RDM, a new series of papers that will
  • Offer to the data practitioner an accessible informal preview of David's work.
  • Contrast it with the the current common interpretation that emerged after EFC's passing and to demonstrate the practical implications of the differences.