ON GET() ACCESSORS
with C. J. Date

 

 

 

Date: 23 Mar 2005

From: KH

 

This message is really in reply to Why GET() Accessors Are Bad by Dave Jarvis, though it has been mentioned a few other times in a couple of articles I mention in my References section. The article is available at: http://www.joot.com/dave/writings/articles/encapsulation.shtml.

 

I have deconstructed the article by rewriting it in the same style, but reflecting the object to relational mapping as I see it. My day job is as a Java programmer working with J2EE and Oracle databases.

 

I only spent a couple of hours writing and editing this, so please forgive any syntactic errors. Errors of semantics I'll just have to live with being flamed for, however...

 

Introduction

 

As a system develops, more features are added, which increases its overall complexity. Complexity is not necessarily a bad thing--it is often, in fact, necessary--but it does need to be carefully managed. As complexity increases, the mechanisms by which information is modified and accessed must be constantly refactored and recategorized, or otherwise the system will become unwieldy in addition to being complex. Object-orientation (OO) provides many features which guard against complexity of interface, but not necessarily against complexity of implementation. And by itself, OO is no substitute for good coding practice, which much necessarily recognise refactoring and recategorization as necessary consumers of programming resources.

 

I'll assume a slight familiarity with object-orientation and the relational model in what follows, but what you'll really need to grasp what I'm talking about is some hands on experience with real world business applications of databases, and how they can (and do) go wrong.

 

The Principle

 

Information hiding is often good for interface simplification and expediency, but sometimes bad for implementation complexity or efficiency. One thing which must surely be hidden is how data is represented at the physical level. However, information must necessarily be presented to external objects for manipulation, lest all the complexity inherent in all the possible manipulations of that information be kept in a single object. This is just a matter of common sense.

 

My favourite example is the relational model of data. The power of the relational model comes from the correct application of information hiding. The representations of data are hidden, because the relational model is a mathematical one--correspondingly there is no programmer burden arising from how data is to be stored. (Assuming you didn't have to implement it, of course--so happy is the life of the modern programmer.) But the information itself--the facts comprising your database--this is not hidden, and this is what lets you produce enormously powerful queries which can reduce mounds of unwieldy information to the answer you were looking for.

 

The most correct object implementation of a tuple (and note that I am not suggesting you do this in an imperative, functional or logical implementation,) in this author's opinion, is an object which exposes the information in every attribute for reading in a representation-independent way, perfectly in line with the OO principles of encapsulation*.

 

* And let me take this opportunity to define what I mean by encapsulation, lest I upset my hero Mr Date. Encapsulation is where an object does not expose, in any way, *how* something is to be done, but instead offers a way *for* it to be done. Now this is really a matter of degree, because some objects may consider that their peer objects should perform many very small actions in order to achieve their more ambitious aims, providing ultimate flexibility at the expense of some reusability, and ultimately, some encapsulation. Object-orientation tends to prefer many "hard-working" methods with few parameters or one super "hard-working" method with many parameters which does a lot, but tells the peer object absolutely nothing about the sequence which achieved it. But good coding practice does tend to negate these ideals--we don't really want lots of methods, or lots of parameters, so what we usually get is lots of objects. Which is arguably worse, but I digress. (I am actually in the lots of methods camp, but I do like to code super methods as well.)

 

Let's concentrate on reading for a moment, and consider writing later. There are two approaches, and the one that I favour, which is endorsed by the J2EE standard, is to have many small actions or methods which can get at representation independent versions of the attributes of the tuple. One per tuple, in fact, or otherwise we couldn't do relational things in Java. Though arguably we shouldn't be trying to do relational things in Java at all, it is kind of soothing to have the database represented in objects which actually meet the object-oriented ideal of representing something in the "real world," even if that real world is a database of relations.

 

The other approach (still present in Java in the ResultSet object) is to have several super methods, one for each type (ignoring the type unsafe possibility of having one super method for all types) which can return all attributes--which attribute we get depends on the parameter we pass. This makes tuples anonymous, which is bad for type safety, though it does have advantages for doing arbitrary projections. (In this approach a single object represents "any" tuple, while in the former scheme we could end up needing a class for every desired projection! However, I have found in my practice that natural joins which proceed by mapping the constituent tuples directly into their object counterparts are much nicer to work with, even if they do make the relational engine work harder.)

 

A question does arise when it comes to assigning new information to a tuple--tuples in the relational algebra are values, and yet often we want to manipulate them as if they were variables. We can certainly, in the object world, treat them differently, just as anyone is free to implement the relational algebra (or anything else for that matter) in any way we like. But what we need to recognise is that if we let users manipulate a tuple, this has ramifications in terms of maintaining the integrity of the relation it belongs to. This integrity maintenance will likely be quite complex, and may in fact border on unwieldy: there are integrity constraints on the attributes, there are integrity constraints on the tuple, there are integrity constraints on the parent relation, there are integrity constraints on the database and there are transactional requirements in terms of ACID.

 

I think the ideal tuple is an immutable (or value) one, but OO purists will hate that. So I (and others--viz EJB and the unfortunately named ValueObjects from J2EE) will allow modification on the understanding that what you have really done is create a new (mutable) tuple which won't be reconciled with the database until you perform some kind of commit() operation on some all encompassing TransactionManager.

 

The Practice

 

Let's put this into action with a concrete example. Presume a system has been created in which an |Employee| has a salary accessor |getSalary()| of type |Salary|. If peer objects to |Employee| want to know what salary she has, we let them query her directly, which is entirely as it should be. It is up to the |Employee| to decide whether to tell her peer her salary. The reason why this does not make accessor implementation expensive or complicated is because most enterprise application servers write the code inside |getSalary()| for you. Currently most do not allow you to use the correct "information hidden" |Salary |type, but you could achieve this by creating a |Salary| relation and corresponding tuple object and then making |getSalary()| traverse a one to one relationship.

 

It does not matter if the company goes global, and the salaries are in many currencies, or if the expansion causes salaries to rise beyond the 16-bit data width you chose. All of the hundreds of places in the code where the accessor is used will proceed to use the |Salary |object to access the raw figure as a precision-independent language number, and later be expanded to deal with the currency using a new |getCurrency() |method. It is true that eventually you will have to have a basic type+, even if that basic type happens to be the serialized binary form of the object in question (not very useful for relational database implementations, but there you go.)

 

+ By basic type I mean something numeric (I call it "algebraic") or symbolic (I call it "textual.") Things like "Strings," "Colours," "Numbers" ("Integers," "Doubles," "Floats," "Longs...") and so on.

 

Although there is a "gotcha" situation if you choose a basic type for something that turns out later to not truly be basic, there is probably an out available in the relational model by treating the value as split across many (including some new) attributes and adding a method to the object to combine them into their correct non-basic type. In this way, you remain compatible with all the existing peer objects which used the old accessor, yet still provide the correct encapsulated answer to new peers++ or later versions of existing peers.

 

++ Of course, their behaviour will need to be looked at, but using a language-provided deprecation flag (such as Java has) may reduce some of the legwork here.

 

What about going the other way--a type which was too complex becoming simplified? (i.e. We made a design oversight or mistake.) Consider that we begin an application's production life with an |Employee|'s |getName()| accessor. Say we did a bad thing and used a basic type for delimited information. Say, then, |getName()| returns a string in the following format (or more likely, we didn't document it, and someone assumed that all names were of this form):

 

      1. First Name

      2. Whitespace

      3. Last Name

 

We will hit a problem if anyone has created (or in-lined) a utility has assumed this format in order to break it down. Note that this is not a bad thing if we have documented it (consider filenames, regular expressions and URLs: three commonly formatted and parsed fully transparent types,) though we should realise we have seriously restricted ourselves by doing so. (This is a good thing for our filenames, regular expressions and URLs, where standards and cross-language compatibility are the name of the game.)

 

The problem can be shown by the following example: suppose two men called /John Smith/ start working for the same company. A quick fix resorts to adding a middle name. The format of the string returned from |getName()| changes:

 

      1. First Name

      2. Whitespace

      3. Middle Name

      4. Whitespace

      5. Last Name

 

If the original format was assumed by any code which displays, stores, encodes, decodes, compares or collates, then all of this code must be rewritten! So this was clearly a stupid thing to do. What we should have done was break the one-to-one correspondence between what| getName()| returns and what the relation behind |Employee |actually stores.

 

Say we change our relation by replacing it with one which separates first name, middle name and last name (as it probably should have.) In a database with user-defined types, that is really easy, but even in one without we can still replace the attribute |Name| with the attributes |FirstName|, |MiddleName| and |LastName| (and create a view combining all three that pretends to be the original base relation, if we so desire.) We then change the object-relational mapping so that it has either a new |getFullName()| method which returns an object analogue of our user defined type:| Name|; or we create the three new accessors (with limited accessibility if so desired) |getFirstName()|, |getMiddleName()| and |getLastName()| plus one which combines them into the correct encapsulated type |Name |we should have started with: |getFullName()|. Finally, we create a new implementation of |getName()| which returns just |FirstName| and |LastName| concatenated together, satisfying all previous parsers syntactically (if not semantically.)

 

Once again, as with our salary example, this latter solution is backwards compatible with code that relies on the format of |getName()|, recognising that the definition was defficient and therefore unfortunate unsuspecting coders made assumptions about the format of data it returned in the absence of a helpful comment "do not rely on the format of the name returned." However, while it won't break code by causing it to crash or enter untested conditions, it may still cause minor anomalies. We mentioned collation and comparison being broken if we just changed the format returned by |getName() |outright: we can't help those cases with this fix either. e.g. "John Smith" equals "John Smith" even if one is John Winston Smith and the other is John Walker Smith. So we still need to go and clean up after our original mistake. C’est la vie.

 

References

 

http://www.holub.com/publications/articles/index.html

http://www.javaworld.com/javaworld/jw-07-1999/jw-07-toolbox.html

http://www.joot.com/dave/writings/articles/encapsulation.shtml

http://www.dbdebunk.com/page/page/1706754.htm

http://www.dbdebunk.com/page/page/1514529.htm

 

 

C. J. Date Responds: I don't want to offer a blow-by-blow response here; I just want to say that the problem at issue (previously described in Dave Jarvis's piece Why GET() Accessors Are Bad) is solved quite elegantly and straightforwardly, I believe, by the mechanism Hugh Darwen and I call "possreps" (short for possible representations).  See our book FOUNDATION FOR FUTURE DATABASE SYSTEMS: THE THIRD MANIFESTO (2nd edition).  The 3rd edition is due for publication later this year, under the title DATABASES, TYPES, AND THE RELATIONAL MODEL: THE THIRD MANIFESTO.

 

 

Posted 5/20/2005