MORE ON TRM: THE WRONG END OF THE STICK
with Fabian Pascal

 

 

 

Last year, in response to a phone and email discussion with me, Josh Hewitt posted at comp.databases.theory what he called an analysis of some performance implications of the TransRelational™ Model (which we deemed incorrect, and advised against publication). We published our reactions to it (see References) within the limits of what we could legally disclose, but the exchange in response to his analysis still left a misleading impression.

 

So I decided to publish our original exchange and let the reader form his/her own opinion.

 

 

From: Lajos Nagy (aka Josh Hewitt)

Date: 5 Oct 2004

 

You mentioned that Mr. Date has begun to write a book on the TransRelational Model.  If I understood you correctly, he cannot publish it until the legal battles are resolved.  May I ask you for a big favor? Can I read the draft of the book?  Of course I know that this is a serious matter and I also know that reading the book is going to be useful for me but not necessarily useful for Mr. Date.  I would love to help so how about this: I would review the book and supply my comments after reading it.  This way Mr. Date will also benefit from my reading the book and he could also say that he did not publish it but only gave it to somebody for review.

 

What do you think?  Would Mr. Date be so kind as to trust me with the book?  (Of course, I would not give it to anybody, or anything like that.)

 

I contact you in this matter because you are close to Mr. Date and I do not know him as well to ask something like this from him.

 

 

From: Fabian Pascal

 

I will see Chris tomorrow and ask him, he's the only one who can authorize it.

 

Steve's reply to your request. For what it's worth.

 

A lot of progress, but also a huge amount of work, at a critical stage, all hands on deck, extremely time consuming, I saw email from the PhD student, have it in my mind on of a bit of a back burner, just been too jammed to get back to him quite yet - given triage priorities re everything else.

 

The book question, for better or worse, of depends on this also. I'm not in a position to just disseminate this stuff (even under NDA) unless it is in the context of something that could help RTI - so it ultimately revolves around the issue that I need to speak to him about directly - whether there is a good fit here that works for both of us and if so working out a plan going forward Until I explore this with him, I simply don't know. And until now, I've just been too jammed.

 

I can understand that he has his own set of priorities and would like to know, PhD student who wants to write a thesis and all that and I'm sympathetic to that and also to the possible synergies between us. I am thinking about this in the background, (together with one or two other possibilities) but there are still only 24 hours in a day, and it's been no distractions all hands on deck with everything going on around the RTI restart efforts.

 

Please communicate this to him and please let him know that I did get his email and am interested in talking to him and exploring this, but that I've just been swamped, so that I'm sorry but that if he can hold on a bit longer I will definitely get back to him as soon as I can. Don't know what else I can say right now.

 

 

From: Lajos Nagy

Date: 21 Oct 2004

 

Last week I read a couple of articles on the physical implementation issues of relational DBMSs.  I was particularly interested in indexing techniques (costs/benefits) and improving (natural) join performance. These articles made me think a little bit about the TransRelational Model (TRM) and what I came up with is a little disturbing.

 

I would really like to know what Steve Tarin has to say about my results.

 

This is a long email so I copied the 'Conclusions' section from the end of the text so you can read it as a kind of an 'Abstract' of the analysis.

 

 

From: Fabian Pascal

 

I'll forward it to Steve, but don't hold your breath, he's buried in business stuff.

 

Whatever your analysis argues, the fact is that a prototype—not even the full product—was tested with humongous transactional databases, and yielded sub-second response. Just so you know.

 

 

From: Lajos Nagy

 

My analysis is based on the two core features of the TransRelational Model:  (1) vertical decomposition of relations and (2) keeping all columns sorted simultaneously.  (1) incurs prohibitive record reconstruction costs during retrieval and (2) incurs prohibitive search

costs during tuple insertion.  (Prohibitive, of course, when we try to use the model on 'life-sized' databases.)

 

The TRM is faster than a traditional DBMS only if we do a restriction query on an attribute that is not indexed in the traditional DBMS. However, this statement of superiority is sort of deceiving since the traditional DBMS has to do a full table-scan and almost anything is faster than that.  (If you read my email you will see that even in this case the TRM is not *much* faster.)  The problem is that we have to pay too big a price for such a questionable gain.

 

By the way, my analysis is nothing complicated.  It is a simple, back-of-envelope type analysis.  I mean, if doing some very basic calculations raises such serious concerns what about a more thorough one?

 

It is quite possible that Steve Tarin has modified the model since it has been published and used some new ideas in implementing the prototype. (In fact, that is why I would love to hear his reaction to my concerns.) I am more or less convinced (based on my analysis) that in its *current* form the TransRelational Model is simply not up to the job.

 

Please, please, take the time to read my analysis.  I promise, it is an easy read.  I would appreciate if you could point to some serious flaws in my reasoning which would make my results invalid.  Do not get me wrong, even for a moment, I am *for* the TRM.  I would love to see a new and efficient way of implementing RDBMSs.  I criticize it to make it better!

 

 

From: Fabian Pascal

 

I raised this issue with Chris and Steve myself originally, and they assured me that is not a problem. I don’t know enough about the details to assess their grounds for saying so.  May have something to do with TRM tables and files being kept in memory, and special algorithms Required Technology developed to address such issues, but let's see what they reply to you.

 

Another point to recall is that TRM, while closer to storage than RM, is not a physical model, and physical implementation has also a role to play.

 

Incidentally, what TRM stores are not columns, but (instantiated values of) domains, which is not the same thing.

 

FYI, here’s Steve’s response. I told you so.

 

I skimmed it briefly - enough to see it was nonsense. Given both its lack of substance and the extreme importance and time consuming nature of the RTI restart efforts in which I'm so heavily engaged, responding to this sort of thing is not something I could justify spending time on right now.

 

I'll call you in the next few days. I've just been way too busy. I can fill you in verbally when I call you - in a couple of minutes - why his comments are so off the wall.

 

My initial reaction was that this is not worthy of a response - plus as I said, I just don't have the time for it - but if you still think otherwise perhaps I could have someone else to spend a few minutes to shoot off a 1-2 paragraph response. I just don't see its contents as something that justifies anything more than that under any circumstances - especially in my current situation re the RTI restart efforts.

 

 

From: Lajos Nagy

 

I would *love* to read that "1-2 paragraph response."  I will tell you why.  I wanted to do research on the TRM but now it seems that in its current form it is not really a feasible implementation model. There would be no better news for me than somebody proving me wrong with a "1-2 paragraph response."  That would make my life much easier because then I could base my dissertation on the TRM, for the most part that is.

 

Now it is one of those situations when I want to be proved wrong!  But, *please*, if what I have written is so obviously wrong then tell me *why* it is so obviously wrong so I can go back and do my homework.  It is just a simple analysis but it is *still* an analysis and I am curious as to what huge mistakes I made that rendered it 'nonsense'.

 

P.S. Really, Steve Tarin might think that I want to bash his invention.  On the contrary!  I would love to see it work because that would give me a topic to research.

 

 

From: Fabian Pascal

 

It's up to Steve. But let me tell you something: Steve is a genius. If he says that the argument is not valid, I can almost guarantee he knows what he's talking about. It seems inconceivable to me that a mind of this caliber would have based his work on something that can be shown fundamentally flawed by a few back-of-napkin calculations.

 

You can't even imagine what he's going thru. Somebody has practically stolen his life's work, and he is buried deep in preventing that. It's one of those frustrations where there is no other choice but to wait. Sorry.

 

He's not the type.

 

We'll see what Chris says when I send him your note, but I suspect he will defer to Steve too.

 

 

From: Lajos Nagy

 

It is not fundamentally flawed.  It just does not scale well when we try to use it for large datasets that do not fit in memory.  And again, if the counter-argument is laughingly simple: why not give it?

 

What is *your* opinion?  Have you read my mail?  Have you found any apparent flaws in the reasoning?  Are any of my assumptions completely incorrect?  Because if neither is the case then the conclusions are inevitable.  Maybe after understanding the concerns for yourself, you will become as curious as I am as to what that simple '1-2 paragraph' answer might be.

 

 

From: Fabian Pascal

 

Given what his objective was—an implementation of RM that would maximize performance—that would be a fundamental flaw. Furthermore, this has not proved to be the case in trials with huge transactional databases, which should have had, had your argument been valid.

 

If you had to go thru what he's going thru, you would understand why he can't bother. Anyway, I betcha he's not gonna call me either, not for even for a simple argument. I am surprised he's still alive.

 

To be honest, that's not my cup of soup, so I will defer to Steve and Chris. I was never really interested in performance. Not because it's not important, but because it's a rather boring aspect to me.

 

 

From: Lajos Nagy

Date: Nov 11 2004

 

I took the liberty and posted the results of my initial analysis to comp.databases.theory.  I would like to know what others think about it:

 

By the way, do you know whether Chris Date has anything to say on this back-of-napkin calculation?

 

 

From: Fabian Pascal

 

It probably would have been wiser to wait for a response from Tarin. If he's right...

 

I sent Chris a copy, but got no reply yet. He's currently traveling and he's too busy to handle even other stuff I sent him.

 

 

From: Lajos Nagy

 

Maybe, but it won't hurt to see what others think about it.

 

I can understand that.

 

 

From: Fabian Pascal

Date: Nov 18 2004

 

Had a chance to ask Chris about your comments on TRM.

 

In an Appendix in his manuscript it is stated that the TRM algorithms described are for in memory/no updates only. You assumed that they are the same for on disk/updates, which is incorrect; they are different. Unfortunately, the latter are covered by NDA and cannot be disclosed at this time.

 

 

From: Lajos Nagy

 

Could have saved me some trouble if I had known this earlier :) Anyway, now that it is obvious that the TRM, in its published form, is not suitable for disk-based databases I really begin to wonder how you can modify it to allow updates and disk-based storage.

 

Incidentally, the discussion in the comp.databases.theory newsgroup also boiled down to the conclusion that the TRM can only be considered for main memory databases.

 

 

From: Fabian Pascal

 

Didn't I tell you to wait?

 

You misunderstood: I did not imply it's not suitable for disk-based; I only said the disk algorithms are not the same as those for in-memory; and that they resolved performance issues.

 

That's what happens if you jump to conclusions without information.

 

 

From: Lajos Nagy

Date: Dec 5 2004

 

I've read your post on the Debunkings site regarding the TransRelational Model (TRM).  Actually the assumptions I made are quite central to the TRM and have little to do with the published algorithms themselves.

 

I will re-iterate them for the sake of the discussion:

 

Assumption 1: Attribute values that belong to the same tuple are not stored physically close to each other.

 

Assumption 2: Attribute values are stored in some sorted order.

 

The second assumption more or less implies the first one, by the way.  It is important to understand that it is not the algorithms that create problems.

 

Problem 1: Lack of physical locality of values belonging to the same tuple incurs high record reconstruction costs.  Proportional to _both_ the number of attributes _and_ the number of tuples in the result set.

 

Problem 2: The need to maintain attribute values in sorted order incurs high insertion/update/deletion costs.  Proportional to _both_ the number of attributes _and_ the number of tuples affected.

 

There is no silver bullet.

 

Attribute values of a record are either close to each other physically or not.  If they are not then the laws of physics will force us to gather the scattered parts.  (Problem 1 created by assumption 1.)

 

Attribute values of a record are either stored in some sorted order or not.  If they are sorted then it takes effort to maintain this sortedness. This effort is proportional to the number of attributes that are stored in a sorted order.  Again, the laws of physics force us.  (Problem 2 created by assumption 2.)

 

As you can see the core problems has little to do with the algorithms of choice.

 

(Traditional DBMSs use indexing structures (like B-Trees) because they want to achieve physical locality (for cheap record reconstruction) and sortedness (for fast searching).  The only way they can do this is by storing some attribute values (at least) twice.  But then, this is quite an old story, isn't it?)

 

"... suffice it to say that they can and have been addressed satisfactorily, and the solutions have been implemented."

 

I did not want to say this so far but I will now: I doubt it.

 

Nevertheless, it is still possible that the TRM is an efficient way of implementing main-memory/read-only databases.  This might also explain the "sub-second" response times in the prototype.

 

Also, it might be interesting to see how the TRM can be extended to the main-memory/read-write case which seems to be a more promising endeavor.

 

 

From: Fabian Pascal

 

I will pass this to Steve. You will have to wait until he/his colleague publish their response.

 

 

From: Fabian Pascal

Date: Jul 17 2005

 

>>Yeah, you are right. Actually I received an email from Fabian Pascal in which he confirmed (quoting Chris Date) that the algorithms and data structures described in the Appendix (and also in the patent) are "for in memory/no updates" only. (However, maybe this 'feature' should be stressed a bit more when advocating the TRM.)

 

This is so misleading as to be an almost lie. You failed to say that we explicitly said in public, as well as in private to you, that there are algorithms for disk-based operation, which are different than those for in-memory operation.

 

Such a comment is a self-serving non-disclosure to get out from under the criticism that you had no basis no which to claim what you did.

 

All the exchange in that thread is practically meaningless, because neither of the people involved have any clue as to what TRM really is.

 

 

From: Lajos Nagy

 

I first read about the TRM in the appendix of Chris Date's book (INTRO...).  It seemed interesting so I took the pain and went through the actual patent(s) to learn more about it.  It turned out that the patent(s) did not add anything substantial to what I already knew from the appendix. I wouldn't have written about the TRM had I not seen it being advertised on DBDebunk as the Holy Grail of RDBMS implementation.  For example, I kindly bring to your attention the wording of the "Go Faster!" seminar:

 

As a reminder: my analysis of the published form of the TRM begins with passages taken verbatim from this seminar announcement.  Well, I read it again thoroughly but I couldn't find any mention of the "in memory/no updates" limitation, although the claims seem to assume such organization. (Insofar that they are the same claims that one finds in the appendix after the description of the "in memory/no updates" version of the TRM.)

 

Look, I'm as much for a radical performance improvement in RDBMS implementation as anybody. If a technology exits that would enable such a radical improvement so much for the better!  Because no matter what happens to RequiredTech or its trade secrets if it can be done it will be done (if not by Steve Tarin then by somebody else.)  So, for now, I'll just wait and see what comes of the TRM.

 

 

From: Fabian Pascal

 

Unfortunately, your defense is irrelevant to my criticism, because it ignores most important facts:

 

1. You contacted me to obtain the information necessary for your analysis and, while I tried to get it from Tarin, I warned you not to publish anything without that information. You ignored that and published anyway.

 

2. We explicitly said that even though only the algorithms for in-memory operation were used to illustrate TRM, this should not be interpreted as TRM being solely for such operation. Don't know how you missed that, but if you did, you should have admitted it, which you have not.

 

3. After you published, Tarin, Date and Ken explicitly said that (a) there is innovation regarding disk-based operation that cannot be disclosed (b) there is a prototype implementation that was tested and confirms that your analysis had no basis in fact.

 

You have not mentioned any of this in your online postings. What is more, you still claimed in exchanges that we confirmed that TRM is for in-memory only, and you did not counter the claims of other ignorami that TRM is a hoax. You never updated your comments to reflect this, nor linked to our clarifications at dbdebunk. That is self-servingly untrue and dishonest.

 

What I am going to do is dig our original exchange on the subject and prove my point by publishing it.

 

My suggestion to you is that, as a student, you should be more humble. Too much ambition to make a splash can cause you trouble, unless, of course, you want to conform to the crappy industry and not be any different.

 

 

From: Fabian Pascal

 

I have prepared our complete exchange on the subject for future posting on dbdebunk.

 

Here's Chris' reply to your analysis, which I relayed to you:

 

In an Appendix in his manuscript it is stated that the TRM algorithms described are for in memory/no updates only. You assumed that they are the same for on disk/updates, which is incorrect; they are different. Unfortunately, the latter are covered by NDA and cannot be disclosed at this time.

 

Here's what you said following that reply:

 

Anyway, now that it is obvious that the TRM, in its published form, is not suitable for disk-based databases I really begin to wonder how you can modify it to allow updates and disk-based storage.

 

You really think that the latter is consistent with the former? Incidentally: There is no "TRM in published form", that was the whole point. What you had is an outline of the concept and an illustration of one possible in-memory operation.

 

Here's what you said in the online exchange:

 

Yeah, you are right. Actually I received an email from Fabian Pascal in which he confirmed (quoting Chris Date) that the algorithms and data structures described in the appendix (and also in the patent) are "for in memory/no updates" only. (However, maybe this 'feature' should be stressed a bit more when advocating the TRM.)

 

You don't see how misleading this is? You focused only on the fact that the Appendix illustrated in-memory operation, but said nothing about the existence of different, but undisclosable disk-based algorithms, and a tested prototype.

 

What is more, you said you did not believe it existed, practically calling us liars.

 

Case closed.

 

 

References

 

More on the TransRelational Model

More on “The Final NULL in the Coffin”

Defending the TransRelational Model

 

 

Posted 7/29/05