ON PERFORMANCE CONSIDERATIONS CONTAMINATING LOGICAL DESIGN
with Fabian Pascal and Chris Date

 

 

 

From: CB

To: Editor

 

Would a column that holds a bitmask value be considered a multivalued column (and thus a violation of 1NF)?

 

Say, for instance, that my database stores (among other things) application users and the rights each user has within the application (e.g., "read invoice", "create invoice", etc).  There exists a many-to-many relationship between users and the rights they possess.  The normal way to set this up (in accordance with the Normal Forms) would be to create a Users table, a Rights table, and a UserRights table (which associates user ids with their ids of their associated rights).

 

However, someone thought of the idea of eliminating the UserRights table, and, instead, adding a column to the Rights table (RightsBit) to store a "bit" value for each right (i.e., 1, 2, 4, 8, 16, 32, etc.), which would serve as that table's primary key.  Them they added a column to the Users table (RightsBitMask) to store the bitmask of that user's rights (i.e., if the user has right 2 and right 8, his RightsBitMask column value would be 10.  If he were assigned rights 1, 4, and 16, his value would be 21).

 

The advantage of this is that, when I want to find out which people have a certain right, I don't have to perform any joins.  I can simply execute a query that performs a bitwise AND operation between the RightsBitMask column and the bit value of the right(s) I'm querying for.  This runs much faster than performing a join query (of course, it goes without saying that, if I were using a TRUE RDBMS, there would be no performance issue - but at my company I'm forced to use a SQL DBMS.  Since a SQL DBMS doesn't separate the logical from he physical design, things such as joins and normalization do indeed have a performance impact).

 

In addition, it is argued that using this bitmask field would not slow down insert/update performance or introduce the possibility of inaccurate data, because if I needed to change the rights a user had, all I would need to do is perform an addition or subtraction operation on that one single field.

 

What are your thoughts on all this?

 

 

From: Fabian Pascal

To: CB

 

I do not give specific design advice because I do not know enough about the data and its uses to speak intelligently of it. To do that I would have to learn about it and that's not feasible via email. I can only state general principles.

 

First, read The Dangerous Illusion Parts 1 and 2 and see if that gives you some ideas.

 

If you read my book, you know that the meaning of a table is its predicate. Formulate the predicates of the 3 tables in the relational design and those of the 2 tables in the bitmask design and see if you can detect any differences/problems.

 

This is a good example of how physical implementation flaws cause users to corrupt the logical design process with performance considerations, which can only lead to trouble (see: Note on A Quote). You may have no choice with a current product, but at least know what you're buying into.

 

 

From: CB

 

I had written you a while ago regarding the use of bitmask columns as a way of getting around needing an intermediate table (i.e., a "user" table and a "rights" table, with a bitmask column in the "user" table containing the sum of the bit values in then rights bit field of the "rights" table in order to indicate what rights a user has).

 

My original question was: would such a field constitute a multivalued attribute, and thus be a violation of 1NF?  And also, could this introduce integrity problems?

 

Well, I think I've answered my own question!

 

(1) This design would not allow for referential integrity.  In other words, let's say that the bit value for the "create contract" right was 4, and that of the "approve contract" right was 8.  The bitmask column of the users table for a user who had these 2 rights would be 12. But ... there would nothing to stop someone from changing the bit values in the rights column (say, to change the bit value for "create contract" to 16, and that of "approve contract" to 2). That would instantly corrupt all the data in the users table; each user would be shown to have the wrong set of rights.

 

The only way to avoid this would be to create a trigger on the rights table, preventing anyone from changing the bit values and the rights each bit represents.  But triggers create a performance overhead, which might outweigh any performance improvements gained by using the bitmask field rather than the intermediate table.

 

(2) In addition, using bit and bitmask fields severely reduces the number of records that the table could hold.  The bit field for record 1 would be 1, record 2 would be 2, record 10 would be 512, etc., and with only 30 or so records, you would end up with bit values that might be too large for the column's data type.  Thus, if you wanted to hold more than, say, 30 records in the rights column, you would need to create a second column, then a third, and so on.  Thus you would end up with repeating groups - another violation of the normal forms.

 

So I guess there's no getting around the need for fully normalized database design.  I thought I had found a trick to do so, but I was wrong.

 

 

Fabian Pascal (with Chris Date) Responds: Initially I was not sure I understood the bitmask scheme. Sure enough, when I violated my reluctance to review/assess the specific design, my reply contained some incorrect details. CB then sent me the following description of the two possible designs:

 

Ø       Normalized version of database:

 

USERS

USERID USERNAME     

===============

1       Bill

2       Joe

3       Jane

:        :

===============

 

RIGHTS

RIGHTSID  RIGHTSDESC

=========================

1         Read

2         Write

3         Delete

4         Edit

Chg Permissions

:          :

=========================

 

USERIGHTS

USERID  RIGHTSID

================

1        1

1        2

1        3

1        4

1        5

2        1

3        2

4        :

================

 

(the “intermediate” or “joining” table between USERS and RIGHTS)

 

 

Ø       Denormalized version employing bit and bitmask column:

 

RIGHTS

RIGHTSBIT  RIGHTSDESC

==========================

 1         Read           

 2         Write           

 4         Delete           

 8         Edit           

16         Chg Permissions

 :           :              

==========================

 

USERS

USERID  USERNAME  RIGHTSBITMASK

===============================

1        Bill      31

2        Joe        1

3        Jane      10

:        :         :

===============================

 

How does this work?  The RIGHTSBITMASK field in the Users table contains the sum of the RIGHTSBIT values in the RIGHTS table for each right the user possesses.  Bill has all rights assigned to him, so his RIGHTSBITMASK value is 31 (1+2+4+8+16).

 

To find out which rights a user has, one merely needs to query that user’s RIGHTSBITMASK field using a bitwise AND comparison.  There is no need to “join” to the RIGHTS table.  To add or remove a right from a user, one merely needs to add or subtract the appropriate RIGHTSBIT value to that user’s RIGHTSBITMASK field.

 

Based on this description we note as follows:

 

Ø       The bitmask scheme is not a denormalized design; it does not violate 1NF. Both designed are normalized, one is simply good design, the other has drawbacks. (Much confusion surrounds the issue of 1NF; it is not possible to do it justice in this exchange, so we intend to publish an article on the topic in the near future.) In the meantime, we refer the reader to the following sources where the subject is discussed:

 

§   C.J. Date, AN INTRODUCTION TO DATABASE SYSTEMS, 7th ed., pp. 127-129 and 372-374.

§   Relation-Valued Attributes or Will the Real First Normal Form Please Stand Up? in C.J. Date and H. Darwen, RELATIONAL DATABASE WRITINGS 1989-1991

§   RM Prescription 7: TYPE GENERATOR RELATION, pp. 135-139, C.J. Date and H. Darwen, THE THIRD MANIFESTO, 2nd ed. (especially part b)

§   C.J. Date, H. Darwen and N.A. Lorentzos, TEMPORAL DATA AND THE RELATIONAL MODEL

 

Ø       My initial comment that the bitmask scheme is poor design remains generally valid. Performance usually involves tradeoffs, and the optimal tradeoff varies with databases, DBMSs and the pattern of access by applications, which is one reason not to contaminate logical design with implementation details. In this case, to avoid a third table (and joins), one table is widened by one additional column, thus probably trading off performance of applications that need to access the individual table for both queries and updates, for performance of applications that need information from both tables.

Ø       The most undesirable cost of the scheme, however, is, as CB figured out himself, its integrity implication (which is usually the consequence of poor design, see The Dangerous Illusion Parts 1 and 2). Instead of relying on the straightforward referential integrity in the three-table design case, the two-table design requires a database constraint (one spanning two tables, see the integrity chapter in PRACTICAL ISSUES IN DATABASE MANAGEMENT) to be formulated by the user. The problem is that current SQL products are not likely to support such a constraint declaratively, which means that it would have to be implemented procedurally by a trigger (in itself not very desirable) or, worse, by application code. What is more, even if the declarative constraint were supported, it turns out to be nightmarish (not to mention that it involves a join for updates!). And to enforce the constraint, the DBMS will have to perform the very join that is to be avoided in the first place!!!

 

We will illustrate the problem in SQL for reasons of familiarity, even though it is not to our taste.  Given the two tables RIGHTS and USERS, the columns RIGHTS and RIGHTSBITMASK are of type BIT(N). Rights are assigned Ids--RIGHTBITs that are powers of 2--i.e., the ith right has RIGHTSBIT equal to a bit string in which the ith bit, counting from the right, is a one and the rest all zeros. (We assume here that (a) there are exactly N distinct rights and (b) those rights are set up in one-to-one correspondence (somehow, possibly arbitrarily with the integers 1 to N.)

 

One benefit of this encoding scheme is that the query “Who has all of the rights in some specified set S?” is easily dealt with by setting up a mask SMASK containing one-bits in all and only those positions corresponding to the privileges in S and ANDing that mask with RIGHTSBITMASK:

 

SELECT userid,…

FROM users

WHERE (smask AND rightsbitmask) = smask;

 

(We ignore here the fact that the SQL standard doesn’t actually support AND, OR, etc., on bit strings!—but I’m sure some products do. If they don’t, they should. So should the standard, of course.)

 

In practice, however, there might be some values of i for which no corresponding right exists (e.g., some rights might have been deleted, or they might not have been assigned in a “dense” fashion in the first place). Thus, we need to ensure that every one-bit appearing anywhere in USERS.RIGHTSBITMASK actually appears in RIGHTS.RIGHTSBIT. This constraint is somewhat analogous to a foreign key constraint, but it isn’t a FK constraint, because it refers to components of column values instead of column values per se. Here then is one possible declarative specification of such a constraint:

 

CREATE ASSERTION DREADFULLY_COMPLEX

 CHECK ((NOT EXISTS

        (SELECT *

         FROM users

         WHERE SUBSTRING(rightsbitmask FROM 1 FOR 1)) = B’1’

           AND NOT EXISTS

              (SELECT *

               FROM rights

               WHERE SUBSTRING(rightsbitmask FROM 1 FOR 1) = B’1’))

        AND

        (NOT EXISTS

        (SELECT *

         FROM users

         WHERE SUBSTRING(rightsbitmask FROM 2 FOR 1)) = B’1’

           AND NOT EXISTS

              (SELECT *

               FROM rights

               WHERE SUBSTRING(rightsbitmask FROM 2 FOR 1) = B’1’))

       AND

 

       ...

 

       AND

       (NOT EXISTS

       (SELECT *

        FROM users

        WHERE SUBSTRING(rightsbitmask FROM N FOR 1)) = B’1’

          AND NOT EXISTS

              (SELECT *

               FROM rights

               WHERE SUBSTRING(rightsbitmask FROM N FOR 1) = B’1’)));

 

Problems with the foregoing (if it is supported by the DBMS):

 

·   It’s complicated. While it might be possible to simplify it, there is almost zero chance that it will perform as well as a traditional FK constraint—even though, conceptually, it is very similar to a FK constraint (and FK constraints receive special case implementation in SQL, but more general constraints do not.

 

·   FK constraints allow you to specify certain compensating actions (declaratively, of course)—e.g., cascade delete. The foregoing expression allows nothing analogous.

 

·   Perhaps most telling of all: The foregoing expression is fragile—it involves N separate conditions ANDed together. If the value of N changes (i.e., if the number of rights supported changes—quite likely), the constraint will have to be changed too. This shouldn’t happen to a dog.

 

Attempts to resolve performance issues at the logical level, forced by the poor relational fidelity and implementation of SQL and its commercial implementations (a) bias the database for some applications and against others and/or (b) complicate integrity enforcement. Only by ignoring these costs can such endeavors be deemed “better”. Practitioners should be particularly suspicious of “bit-twiddling schemes”, which together with SQL products defeat a practical relational purpose: human performance bottlenecks usually overwhelm machine performance bottlenecks, so the latter should not be traded off for the former.

 

A well-implemented, truly relational DBMS (TRDBMS) would obviate the need for such complications. If products force users into such schemes, users should be aware of the cost and its real cause, instead of blaming the relational model and/or design.

 

Posted 01/17/03

 

 

 

[ABOUT] [QUOTES] [LINKS]