PAPERS


PRACTICAL DATABASE FOUNDATIONS SERIES


This series are a set of papers and seminars, the objective of which is to make the scientific foundation of database management and its practical implications accessible to data professionals and users.

They cover aspects that are not addressed, or are distorted in the IT industry, are not sufficiently or properly clarified and/or are misunderstood. They stay as close to and and as accurate about the formal foundations of the database field as possible, while using simple, readily understood language.


NEW! Paper #6 (October 2014) NEW!


DOMAINS: THE DATABASE GLUE


Domains are a fundamental database feature. Codd, the inventor of the relational model referred to them as the "glue that holds the database together". Mathematical relations are defined on domains and, therefore, so are their database representations, R-tables. Yet they are one of the least understood database features. This is both a cause and a consequence of the failure by SQL DBMS's to implement them truly and fully. The negative implications e.g., weak support for complex and/or user-defined data types" is claimed a weakness of the relational model, rather than an industry implementation failure. This paper
  • Defines and explains the domain concept;
  • Distinguishes between domains and data types;
  • Discusses kinds of domains;
  • Clarifies DBMS domain support;
  • Discusses practical implications for database and DBMS design.

Table of Contents


1. Property Business Rules and Domains

2. Domains vs. Data Types

3. Value Atomicity and Encapsulation

4. Kinds of Domains
4.1. Simple and Complex Domains
4.2. System-defined Domains
4.3. User-defined Domains

5. DBMS Domain Support
5.1. Complex Domains

6. Practical Implications
6.1. Relational Domains vs. Object Classes
6.2. Business Modeling
6.3. Database Design
6.4. DBMS Design
6.4.1. SQL “Domains”
6.4.2. “Universal” DBMS

7. Conclusion

References


REVISED! Paper #1, V. 3 (September 2014) REVISED!


BUSINESS MODELING FOR DATABASE DESIGN


Few data management aspects are as widely misunderstood and abused as business modeling and database design. Four types of model are often confused, with costly implications: 
  • Business model 
  • Logical model  
  • Physical model
  • Data model 
This is an introduction to data fundamentals critical to business modeling for database design. It focuses particularly on conceptual models of business, database logical models and the relational data model that facilitates maping the former to the latter.The paper 
  • explains basic modeling and design concepts
  • how to formalize a informal business models via the relational data model into formal logical database models which can be computerized and managed by a DBMS.

Table of Contents

Introduction

1. Business Modeling
1.1. Basic Modeling Concepts
1.2. Business Rules
1.2.1. Property Rules
1.2.2. Class Rules
1.2.3. Attribute Rules
1.2.4. Identity Rules
1.2.5. Functional Dependency Rules
1.2.6. Referential Rules
1.2.7. Associative Entities
1.2.8. Arbitrary Rules
1.3. Business Models

2. Database Design
2.1. Formalizing the Informal
2.2. Predicates and Propositions
2.3. The Relational Data Model
2.3.1. Relational Structure
2.3.2. Relational Manipulation
2.3.3. Relational Integrity
2.3.3.1. External Predicates
2.3.3.2. Internal Predicates
2.4. Logical Models

3. Understanding Database Management
3.1. Note on missing values
3.2. A Foundation Framework

Appendix A: Constraint Formulation and Verification
Appendix B: Integrity Constraints in Dataphor’s D4
Appendix C: Applying the Framework

References


REVISED! Paper #2, v. 6 (September 2014) REVISED!


THE COSTLY ILLUSION: NORMALIZATION, INTEGRITY AND PERFORMANCE


Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design.

A core database design principle is the Principle of Full Normalization (PFN). Database designs that do not adhere to it present certain practical drawbacks for data manipulation, integrity enforcement and, consequently, for application development and usability. Despite the plethora of information on the subject (not all of it correct, or well explicated), the subject is still poorly understood.

Paper #1 in this series, Business Modeling for Database Design, outlines a methodology that implicitly produces fully normalized databases. But due either to inadvertent errors, or to intentional “denormalization for performance”, PFN violations occur frequently. They impose considerable costs to which many data professionals are oblivious. Data redundancy and the risk of inconsistent databases is only one of them, albeit a major one.

Explicit further normalization should be necessary only for database design repair, when databases were poorly designed, to eliminate the drawbacks.This paper explains in easy to understand language:
  • The kinds of PFN violation possible;
  • The costly implications of PFN violations;
  • How to repair poor designs and eliminate the drawbacks;
  • Why denormalization for performance is an illusion.

Table of Contents 

Introduction

1. R-tables, Keys and Dependencies

2. Normalization and Normal Forms

3. Further Normalization As Design Repair
3.1. Join Dependencies
3.2. “The Whole Key” and 2NF
3.3. “Nothing But the Key” and 3NF
3.4. “The Whole Key” and BCNF
3.5. Multivalued Dependencies and 4NF
3.6. Interval Data and 6NF

4. “Denormalization For Performance”
4.1. The Logical-Physical Confusion
4.2. Redundancy Control
4.3. JDC’s and SQL
4.4. The Real Problem and Solution

5. Conclusion and Recommendations

References


Paper #3 V.3 (August 2013)


THE FINAL NULL IN THE COFFIN: A RELATIONAL SOLUTION TO MISSING DATA


Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design.

The relational data model is based on the two-valued logic (2VL) of the real world: every proposition about the real world is unequivocally true or false. But our knowledge of the real world is usually imperfect—some data is missing—which means that we don't always know whether certain propositions are true or not. Then 2VL no longer applies and database query results are no longer guaranteed to be provably logically correct with respect to the real world.

Missing data has possibly been the thorniest aspect of database management: without a logically sound yet practical solution, data professionals and users are left between a rock and a hard place. They must either (a) rely on SQL's arbitrary and flawed implementations of three-valued logic (3VL) based on NULLs and risk results that are easy to misinterpret, or erroneous in ways hard to discern, or (b) undertake in applications a prohibitively complex, error prone and unreliable burden that belongs in the DBMS.

This paper illustrates some of the drawbacks of the many-valued logic (nVL, n > 2) approach to missing data and SQL’s NULL scheme and proposes a solution within the 2VL/relational framework that:
  • Guarantees data integrity and logically correct query results;
  • Avoids the complications and problematics of nVL/NULL's;
  • Requires no changes to the relational model;
  • Is largely transparent to users;
  • Keeps users better apprised of the existence and effects of missing data. 
Note: The proposed solution requires research into its implications for data manipulation and integrity enforcements before it is implemented, but we believe it is theoretically sound and implementable in a truly relational DBMS (TRDBMS) using technologies that, unlike SQL, support full physical data independence e.g. the TransRelational™ Model (TRM).
  
Table of Contents

Introduction

1. “Inapplicable Data”: Nothing's Missing

2. Missing Data: Into the Unknown

3. SQL’s 3VL: NULL

4. Known Unknowns: Metadata

5. A 2VL Relational Solution
5.1. The Practicality of Theory
5.2. 2VL vs. NULL: A Real World Example
5.3. Relation Proliferation
5.4. The TransRelational™ Implementation Model

Conclusion

Appendix A: Misconceptions Debunked

References


Paper #4 V.1 (January 2014) 


THE KEY TO KEYS: A MATTER OF IDENTITY



Note: This paper assumes familiarity with the concepts and terminology introduced in papers #1, Business Modeling For Database Design

If entities in the real world did not have identifiers—attributes that capture their identity and uniquely identify them—we would not be able to tell them apart. It follows that an accurate database representation of a business reality must include keys, R-table columns that formally represent the real world identifiers in the database.
Keys and types thereof, their necessity, selection, function and properties are often not well known and understood. This paper
  • Defines and explains the key concept;
  • Explains the function and properties of the various types of key;
  • Describes the criteria for key selection;
  • Specifies what is proper DBMS key support;
  • Assesses SQL's key support;
  • Debunks some common misconceptions about keys.
Note: Figures are numbered by Chapter. The examples are kept as simple as possible to make it easy to understand the ideas, but the reader should consider the implications for realistically complex tables.

Table of Contents

Introduction

1. R-tables and Integrity Constraints

2. Keys and Key Constraints

3. Kinds of Keys
3.1. Natural Keys
3.1.1. Candidate and Primary Keys
3.1.2. Simple and Composite Keys
3.2. Surrogate Keys

4. Key Functions
4.1. Duplicate prevention
4.2. Integrity Burden Reduction
4.3. View Updatability

5. Foreign Keys and Referential Constraints

6. DBMS Key Support

7. Keys in SQL
7.1. SQL and Duplicates

Conclusion

Appendix A: Duplicate removal in SQL
Appendix B: Duplicates and Language Redundancy

References


REVISED! Paper #5 V.2 (September 2014) REVISED!


TRULY RELATIONAL - WHAT IT REALLY MEANS: CODD'S ORIGINAL 1969-70 PAPERS 



In the first paragraph of his first ever published exposition of the relational idea 42 years ago, E. F. Codd claimed three critical advantages of a relational model of data:
  • A sound theoretical basis—logic and mathematics—for database management
  • Physical data independence
  • DBMS-guaranteed integrity of data and query results
He also made explicit the objective of relieving practice from the complexities and rigidity of the hierarchic and network commercial DBMS products at the time.

Yet from 1969 to date the industry has failed to concretize Codd’s ideas by implementing SQL-based products that have limited relational fidelity and violate many relational principles. Moreover, instead of correcting the mistakes, the industry has been regressing to the same costly and unproductive technologies made obsolete by Codd’s innovation more than thirty years ago.

Much of this is due to an educational failure: vendors, experts, users and the trade press do not know, understand and appreciate the practical value of Codd’s contribution and the huge cost of ignoring it. Young generations of practitioners are not even introduced to the relational model and are instilled either with the notion that SQL products are relational, or that relational technology is obsolete. Driven by industry rather than principles, academia has renounced its true function of education and is serving as product trainer for vendors.

It is therefore imperative—and proper to start this series, intended to make data fundamentals accessible to practitioners—to revisit Codd’s original work, reassert those aspects that have been ignored, recall those that were missed, clarify those that are opaque, correct misinterpretations as well as original mistakes and settle current disagreements and confusion over what the relational model really is.

This paper covers Codd’s seminal first two papers, Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks (1969), and A Relational Model of Data for Large Shared Data Banks (1970), the latter being an important public revision of the former (an internal IBM document), that contained changes and introduced new material.

Table of Contents

Introduction

 1. Relations on Domains

 2. Relation Representation

 3. Time-varying Relations

 4. Relation Interpretation

 5. Data Sub-language

 6. Atomicity, Nested Relations and Normalization

 7. Foreign Keys and Normal Form

 8. Operations on Relations

 9. Kinds of Relations

10. Derivability, Redundancy, Consistency

Conclusion

Appendix A: Debunking Misconceptions

References




USE OF MATERIAL POLICY

Publisher hereby grants to purchaser non-exclusive, non-transferable, revocable rights to use the materials for non-commercial purposes as follows:

Authorized Use

Authorized Users: Individuals who purchase the materials for their own exclusive use;

Authorized Uses: Authorized users may use the materials as is consistent with the Fair Use Provisions of United States and international law, for non-commercial purposes as follows:
  • Individual buyers have exclusive access to the materials;
  • Academic buyers may use the material, with appropriate credit, in the preparation of course packs or other non-commercial, educational materials;
  • Authorized may use, with appropriate credit, figures, tables and brief excerpts in the user's own scientific, scholarly and educational works;
  • Institutional volume buyers should check with Publisher on authorized use;
Specific Restrictions on Use of Materials

Unauthorized Use: Authorized users shall not

  • Knowingly permit others to use the materials;
  • Modify or create a derivative work without prior written permission of Publisher;
  • Remove, obscure or modify any copyright or other notice;
  • Use the materials for commercial purposes, including but not limited to sale, reproduction, or distribution in any form;


PRICING

Single paper rates

Individuals $15.00
Corporate   $25.00
Nonprofit   $20.00
Students    $10.00
  
Contact us for volume licenses.

Order all papers and get a two-paper combo free:
WHAT FIRST NORMAL FORM REALLY MEANS by C. J. Date
WHAT FIRST NORMAL FORM MEANS NOT by F. Pascal



UPDATES AND NEW VERSIONS

Updates are free.
New versions:
  • Free to buyers in the same year. 
  • Preceding year buyers get 25% discount.
  • New purchases for older buyers.

HOW TO ORDER

We prefer payments in US$ via PayPal. For check/money order payments email us at the email address on the ABOUT page.

  1. Select the paper(s) to order;
  2. Calculate the total order price;
  3. Specify the papers you order in the text field below;
  4. Click on the Buy Now button;
  5. At the payment form, login to your Paypal account;
  6. Upon notification of payment by PayPal, we will email download instructions.


Which paper(s)?

Address any questions or payment/delivery issues to the above address.
We appreciate your support, which keeps this site free.

Thank you.

Fabian Pascal
Founder, Editor and Publisher

1 comment:

  1. Some professors erroneously tell students The item "law school is not all about memorization." my spouse and i say "erroneously" since the law school will be information about memorization... thus much more. But for the moment, let's only focus in grades - AS WELL AS with regard to all courses, That means focusing at exams. electronic press kit

    ReplyDelete