Last year, in response to a phone and email discussion with
me, Josh Hewitt posted at comp.databases.theory what he called an analysis of
some performance implications of the TransRelational™ Model (which we deemed
incorrect, and advised against publication). We published our reactions to it
(see References) within the limits of what we could legally disclose, but the
exchange in response to his analysis still left a misleading impression.
So I decided to publish our original exchange and let the
reader form his/her own opinion.
From: Lajos Nagy (aka Josh Hewitt)
Date: 5 Oct 2004
You mentioned that Mr. Date has begun to write a book on the TransRelational Model. If I understood you correctly, he cannot publish it until the legal battles are resolved. May I ask you for a big favor? Can I read the draft of the book? Of course I know that this is a serious
matter and I also know that reading the book is going to be useful for me but
not necessarily useful for Mr. Date. I would love to help so how about this: I would review the book and supply my comments after reading it. This way Mr.
Date will also benefit from my reading the book and he could also say that he
did not publish it but only gave it to somebody for review.
What do you think?
Would Mr. Date be so kind as to trust me with the book?
(Of course, I would not give it to anybody, or anything like that.)
I contact you in this matter because you are close to Mr.
Date and I do not know him as well to ask something like this from him.
From: Fabian Pascal
I will see Chris tomorrow and ask him, he's the only one who
can authorize it.
Steve's reply to your request. For what it's worth.
A lot of progress, but also a huge amount of work, at a
critical stage, all hands on deck, extremely time consuming, I saw email from
the PhD student, have it in my mind on of a bit of a back burner, just been too
jammed to get back to him quite yet - given triage priorities re everything
else.
The book question, for better or worse, of depends on this
also. I'm not in a position to just disseminate this stuff (even under NDA)
unless it is in the context of something that could help RTI - so it ultimately
revolves around the issue that I need to speak to him about directly - whether
there is a good fit here that works for both of us and if so working out a plan
going forward Until I explore this with him, I simply don't know. And until
now, I've just been too jammed.
I can understand that he has his own set of priorities and
would like to know, PhD student who wants to write a thesis and all that and
I'm sympathetic to that and also to the possible synergies between us. I am
thinking about this in the background, (together with one or two other
possibilities) but there are still only 24 hours in a day, and it's been no
distractions all hands on deck with everything going on around the RTI restart
efforts.
Please communicate this to him and please let him know that
I did get his email and am interested in talking to him and exploring this, but
that I've just been swamped, so that I'm sorry but that if he can hold on a bit
longer I will definitely get back to him as soon as I can. Don't know what else
I can say right now.
From: Lajos Nagy
Date: 21 Oct 2004
Last week I read a couple of articles on the physical
implementation issues of relational DBMSs.
I was particularly interested in indexing techniques (costs/benefits)
and improving (natural) join performance. These articles made me think a little
bit about the TransRelational Model (TRM) and what I came up with is a little
disturbing.
I would really like to know what Steve Tarin has to say about
my results.
This is a long email so I copied the 'Conclusions' section
from the end of the text so you can read it as a kind of an 'Abstract' of the
analysis.
From: Fabian Pascal
I'll forward it to Steve, but don't hold your breath, he's
buried in business stuff.
Whatever your analysis argues, the fact is that a
prototype—not even the full product—was tested with humongous transactional
databases, and yielded sub-second response. Just so you know.
From: Lajos Nagy
My analysis is based on the two core features of the
TransRelational Model: (1) vertical
decomposition of relations and (2) keeping all columns sorted
simultaneously. (1) incurs prohibitive
record reconstruction costs during retrieval and (2) incurs prohibitive search
costs during tuple insertion. (Prohibitive, of course, when we try to use the model on 'life-sized' databases.)
The TRM is faster than a traditional DBMS only if we do a
restriction query on an attribute that is not indexed in the traditional DBMS.
However, this statement of superiority is sort of deceiving since the
traditional DBMS has to do a full table-scan and almost anything is faster than
that. (If you read my email you will see that even in this case the TRM is not *much* faster.) The problem is that we have to pay too big a price for such a questionable gain.
By the way, my analysis is nothing complicated. It is a simple, back-of-envelope type
analysis. I mean, if doing some very
basic calculations raises such serious concerns what about a more thorough one?
It is quite possible that Steve Tarin has modified the model since
it has been published and used some new ideas in implementing the prototype.
(In fact, that is why I would love to hear his reaction to my concerns.) I am
more or less convinced (based on my analysis) that in its *current* form the
TransRelational Model is simply not up to the job.
Please, please, take the time to read my analysis. I promise, it is an easy read. I would appreciate if you could point to
some serious flaws in my reasoning which would make my results invalid. Do not get me wrong, even for a moment, I am
*for* the TRM. I would love to see a
new and efficient way of implementing RDBMSs.
I criticize it to make it better!
From: Fabian Pascal
I raised this issue with Chris and Steve myself originally,
and they assured me that is not a problem. I don’t know enough about the
details to assess their grounds for saying so.
May have something to do with TRM tables and files being kept in memory,
and special algorithms Required Technology developed to address such issues,
but let's see what they reply to you.
Another point to recall is that TRM, while closer to storage
than RM, is not a physical model, and physical implementation has also a role
to play.
Incidentally, what TRM stores are not columns, but
(instantiated values of) domains, which is not the same thing.
FYI, here’s Steve’s response. I told you so.
I skimmed it briefly - enough to see it was nonsense. Given
both its lack of substance and the extreme importance and time consuming nature
of the RTI restart efforts in which I'm so heavily engaged, responding to this
sort of thing is not something I could justify spending time on right now.
I'll call you in the next few days. I've just been way too
busy. I can fill you in verbally when I call you - in a couple of minutes - why
his comments are so off the wall.
My initial reaction was that this is not worthy of a
response - plus as I said, I just don't have the time for it - but if you still
think otherwise perhaps I could have someone else to spend a few minutes to
shoot off a 1-2 paragraph response. I just don't see its contents as something
that justifies anything more than that under any circumstances - especially in
my current situation re the RTI restart efforts.
From: Lajos Nagy
I would *love* to read that "1-2 paragraph
response." I will tell you why. I wanted to do research on the TRM
but now it seems that in its current form it is not really a feasible
implementation model. There would be no better news for me than somebody
proving me wrong with a "1-2 paragraph response."
That would make my life much easier because
then I could base my dissertation on the TRM, for the most part that is.
Now it is one of those situations when I want to be proved
wrong! But, *please*, if what I have
written is so obviously wrong then tell me *why* it is so obviously wrong so I
can go back and do my homework. It is
just a simple analysis but it is *still* an analysis and I am curious as to
what huge mistakes I made that rendered it 'nonsense'.
P.S. Really, Steve Tarin might think that I want to bash his
invention. On the contrary! I would love to see it work because that
would give me a topic to research.
From: Fabian Pascal
It's up to Steve. But let me tell you something: Steve is a
genius. If he says that the argument is not valid, I can almost guarantee he
knows what he's talking about. It seems inconceivable to me that a mind of this
caliber would have based his work on something that can be shown fundamentally
flawed by a few back-of-napkin calculations.
You can't even imagine what he's going thru. Somebody has
practically stolen his life's work, and he is buried deep in preventing that.
It's one of those frustrations where there is no other choice but to wait.
Sorry.
He's not the type.
We'll see what Chris says when I send him your note, but I
suspect he will defer to Steve too.
From: Lajos Nagy
It is not fundamentally flawed. It just does not scale well when we try to use it for large
datasets that do not fit in memory. And
again, if the counter-argument is laughingly simple: why not give it?
What is *your* opinion?
Have you read my mail? Have you
found any apparent flaws in the reasoning?
Are any of my assumptions completely incorrect? Because if neither is the case then the
conclusions are inevitable. Maybe after
understanding the concerns for yourself, you will become as curious as I am as
to what that simple '1-2 paragraph' answer might be.
From: Fabian Pascal
Given what his objective was—an implementation of RM that
would maximize performance—that would be a fundamental flaw. Furthermore, this
has not proved to be the case in trials with huge transactional databases,
which should have had, had your argument been valid.
If you had to go thru what he's going thru, you would
understand why he can't bother. Anyway, I betcha he's not gonna call me either,
not for even for a simple argument. I am surprised he's still alive.
To be honest, that's not my cup of soup, so I will defer to
Steve and Chris. I was never really interested in performance. Not because it's
not important, but because it's a rather boring aspect to me.
From: Lajos Nagy
Date: Nov 11 2004
I took the liberty and posted the
results of my initial analysis to comp.databases.theory. I would like to know what others think about it:
By the way, do you know whether Chris Date has anything to
say on this back-of-napkin calculation?
From: Fabian Pascal
It probably would have been wiser to wait for a response from
Tarin. If he's right...
I sent Chris a copy, but got no reply yet. He's currently
traveling and he's too busy to handle even other stuff I sent him.
From: Lajos Nagy
Maybe, but it won't hurt to see what others think about it.
I can understand that.
From: Fabian Pascal
Date: Nov 18 2004
Had a chance to ask Chris about your comments on TRM.
In an Appendix in his manuscript it is stated that the
TRM algorithms described are for in memory/no updates only. You assumed that they are the same for on disk/updates, which
is incorrect; they are different. Unfortunately, the latter are
covered by NDA and cannot be disclosed at this time.
From: Lajos Nagy
Could have saved me some trouble if I had known this earlier
:) Anyway, now that it is obvious that the TRM, in its published form, is not
suitable for disk-based databases I really begin to wonder how you can modify
it to allow updates and disk-based storage.
Incidentally, the discussion in the comp.databases.theory newsgroup
also boiled down to the conclusion that the TRM can only be considered for main
memory databases.
From: Fabian Pascal
Didn't I tell you to wait?
You misunderstood: I did not imply
it's not suitable for disk-based; I only said the disk algorithms are not the
same as those for in-memory; and that they resolved performance issues.
That's what happens if you jump to conclusions without
information.
From: Lajos Nagy
Date: Dec 5 2004
I've read your post on the Debunkings site regarding the
TransRelational Model (TRM). Actually the assumptions I made are quite central to the TRM and have little to do with the published algorithms themselves.
I will re-iterate them for the sake of the discussion:
Assumption 1: Attribute values that belong to the same tuple
are not stored physically close to each other.
Assumption 2: Attribute values are stored in some sorted
order.
The second assumption more or less implies the first one, by
the way. It is important to understand that it is not the algorithms that create problems.
Problem 1: Lack of physical locality of values belonging to
the same tuple incurs high record reconstruction costs. Proportional to _both_ the number of
attributes _and_ the number of tuples in the result set.
Problem 2: The need to maintain attribute values in sorted
order incurs high insertion/update/deletion costs. Proportional to _both_ the number of attributes _and_ the number
of tuples affected.
There is no silver bullet.
Attribute values of a record are either close to each other
physically or not. If they are not then the laws of physics will force us to gather the scattered parts. (Problem 1 created by assumption 1.)
Attribute values of a record are either stored in some sorted
order or not. If they are sorted then it takes effort to maintain this sortedness. This effort is proportional to the number of attributes that are stored in a sorted order. Again, the laws of physics force us. (Problem 2 created by assumption 2.)
As you can see the core problems has little to do with the
algorithms of choice.
(Traditional DBMSs use indexing structures (like B-Trees)
because they want to achieve physical locality (for cheap record
reconstruction) and sortedness (for fast searching). The only way they can do this is by storing some attribute values
(at least) twice. But then, this is
quite an old story, isn't it?)
"... suffice it to say that they can and have been
addressed satisfactorily, and the solutions have been implemented."
I did not want to say this so far but I will now: I doubt it.
Nevertheless, it is still possible that the TRM is an
efficient way of implementing main-memory/read-only databases. This might also explain the
"sub-second" response times in the prototype.
Also, it might be interesting to see how the TRM can be
extended to the main-memory/read-write case which seems to be a more promising
endeavor.
From: Fabian Pascal
I will pass this to Steve. You will have to wait until he/his
colleague publish their response.
From: Fabian Pascal
Date: Jul 17 2005
>>Yeah, you are right. Actually I received an email
from Fabian Pascal in which he confirmed (quoting Chris Date) that the
algorithms and data structures described in the Appendix (and also in
the patent) are "for in memory/no updates" only. (However, maybe this
'feature' should be stressed a bit more when advocating the TRM.)
This is so misleading as to be an almost lie. You failed to
say that we explicitly said in public, as well as in private to you, that there
are algorithms for disk-based operation, which are different than
those for in-memory operation.
Such a comment is a self-serving non-disclosure to get out
from under the criticism that you had no basis no which to claim what you did.
All the exchange in that thread is practically meaningless,
because neither of the people involved have any clue as to what TRM really is.
From: Lajos Nagy
I first read about the TRM in the appendix of Chris Date's
book (INTRO...). It seemed interesting
so I took the pain and went through the actual patent(s) to learn more about
it. It turned out that the patent(s)
did not add anything substantial to what I already knew from the appendix. I
wouldn't have written about the TRM had I not seen it being advertised on DBDebunk
as the Holy Grail of RDBMS implementation.
For example, I kindly bring to your attention the wording of the
"Go Faster!" seminar:
As a reminder: my analysis of the published form of the TRM
begins with passages taken verbatim from this seminar announcement.
Well, I read it again thoroughly but I
couldn't find any mention of the "in memory/no updates" limitation,
although the claims seem to assume such organization. (Insofar that they are
the same claims that one finds in the appendix after the description of the
"in memory/no updates" version of the TRM.)
Look, I'm as much for a radical performance improvement in
RDBMS implementation as anybody. If a technology exits that would enable such a
radical improvement so much for the better!
Because no matter what happens to RequiredTech or its trade secrets if
it can be done it will be done (if not by Steve Tarin then by somebody
else.) So, for now, I'll just wait and
see what comes of the TRM.
From: Fabian Pascal
Unfortunately, your defense is irrelevant to my criticism,
because it ignores most important facts:
1. You contacted me to obtain the information necessary for
your analysis and, while I tried to get it from Tarin, I warned you not to
publish anything without that information. You ignored that and published
anyway.
2. We explicitly said that even though only the
algorithms for in-memory operation were used to illustrate TRM, this should not
be interpreted as TRM being solely for such operation. Don't know how you
missed that, but if you did, you should have admitted it, which you have not.
3. After you published, Tarin, Date and Ken explicitly said
that (a) there is innovation regarding disk-based operation that cannot be
disclosed (b) there is a prototype implementation that was tested and
confirms that your analysis had no basis in fact.
You have not mentioned any of this in your online postings.
What is more, you still claimed in exchanges that we confirmed that TRM is for
in-memory only, and you did not counter the claims of other ignorami that TRM
is a hoax. You never updated your comments to reflect this, nor linked to our
clarifications at dbdebunk. That is self-servingly untrue and dishonest.
What I am going to do is dig our original exchange on the
subject and prove my point by publishing it.
My suggestion to you is that, as a student, you should be
more humble. Too much ambition to make a splash can cause you trouble, unless,
of course, you want to conform to the crappy industry and not be any different.
From: Fabian Pascal
I have prepared our complete exchange on the subject for future posting on dbdebunk.
Here's Chris' reply to your analysis, which I relayed to you:
In an Appendix in his manuscript it is stated that the TRM
algorithms described are for in memory/no updates only. You assumed that they
are the same for on disk/updates, which is incorrect; they are different. Unfortunately, the latter are
covered by NDA and cannot be disclosed at this time.
Here's what you said following that reply:
Anyway, now that it is obvious that the TRM, in its published
form, is not suitable for disk-based databases I
really begin to wonder how you can modify it to allow updates and disk-based
storage.
You really think that the latter is consistent with the
former? Incidentally: There is no "TRM in published form", that was
the whole point. What you had is an outline of the concept and an illustration
of one possible in-memory operation.
Here's what you said in the online exchange:
Yeah, you are right. Actually I received an email from Fabian
Pascal in which he confirmed (quoting Chris Date) that the algorithms and data
structures described in the appendix (and also in the patent) are "for in
memory/no updates" only. (However, maybe this
'feature' should be stressed a bit more when advocating the TRM.)
You don't see how misleading this is? You focused only
on the fact that the Appendix illustrated in-memory operation, but said nothing
about the existence of different, but undisclosable disk-based algorithms, and
a tested prototype.
What is more, you said you did not believe it existed,
practically calling us liars.
Case closed.
References
More
on the TransRelational Model
More
on “The Final NULL in the Coffin”
Defending
the TransRelational Model
Posted 7/29/05