From: Fabian Pascal
To: Ken North, Robert Seiner
Date: 10/27/2003
I strongly suggest you stop using the term “unstructured
data”, which is a contradiction in terms. Do not reinforce fallacies.
See Unstructured Thinking.
From: K
Not always. Attached is an example of unstructured data (a
document fragment).
From Unstructured Thinking:
But, of course, that is inaccurate: the data is organized in
accordance with a "contract text data model", so to speak--words,
paragraphs, clauses, sections, and so on, with its own constraints and
operators. These would have to be implemented in the system that manages the
data, such that it can protect its integrity and derive information from it.
Again, not always. The fragment above is one of a collection
of scanned pages.
The problem in this case is incomplete content (missing
pages). Different people have filed away documents, diagrams, and partial
documents over a 40-year period.
From: Fabian Pascal
Always. There is no such thing as "unstructured
data". That means random noise, which has no structure whatsoever
and, therefore, is meaningless. It is the structure that gives
meaning/content and makes data.
It has nothing to do with scanning, or incompleteness, or missing, or anything.
It is structured, whatever it is. Diagrams have one type of structure,
partial documents different types of structure, but there is always some
structure by definition.
The term "unstructured data" is a misnomer based on misconception: it
essentially refers to data that is not structured in tables, or spreadsheets,
or whatever; mainly text, graphics, etc. But that is not unstructured, it's
just different structures than tables or spreadsheets, that's all.
And that's a core issue, because structure determines the integrity
and manipulation of the data, which are different for each type of
structure. The point of relational structure is that it is the simplest
formal structure for integrity and manipulation. Any other structure adds
complexity, but no power.
Now, if you can't or won't structure it in tables, you must accept the
complexity, but that's different than saying that you can avoid relational
structure and yet get the same informational value out of the data. And it
certainly does not mean unstructured.
Learn the fundamentals.
From: Ken North
You're saying that, by definition, data has structure.
So what's your term for factual information that lacks
structure, but is used for reasoning or discussion?
From: Fabian Pascal
Ken, Ken, c'mon, pay attention, I already answered that: It
is structured data, but just not relationally structured data, or not
spreadsheet-structured data, or what have you.
Which is its precise problem: it's not the kind of structure that lends itself
to the formal simplicity of relational manipulation and integrity. To
get that kind of benefit you got to model relationally--that is, structure
it in R-tables. You can choose not to do that, and you may have reasons
for it, but then you buy into more complexity for less informational value.
That last part you don't hear anybody talking the unstructured nonsense say.
The industry wants the cake and to eat it too. It's fooling itself that it can
have XML databases "without doing db design" (read: without relational
structure) and get out of them the same or more of what they would get with
RDBMSs (even SQL DBMSs). And that's due mainly to ignorance.
From: Robert Seiner
To: Fabian Pascal
Do I understand you correctly that I should stop using the
term "unstructured data"? I just returned from a client after
spending part of the day discussing "unstructured data". I do
not think I will stop using the term but I will read your article
none-the-less, and perhaps, then we can talk about it.
From: Fabian Pascal
Yes, I mean you should stop, because it is nonsense. You can
do the same work without using contradictions in terms. Read what I told Ken
[above].
A model is defined as structure and behavior (manipulation,
which includes integrity).
Structure is what is being manipulated.
Can you tell me what does [the title to your article] A Conceptual Model for
Unstructured Data means? Ponder that.
Are you going to EDF in Cherry Hill? If so, why don't you attend my session, I
cover this very topic.
From: Robert Seiner
Read my articles on that subject and you will know what I
mean. I give my very simple definitions of structured and unstructured
data. I will not be at EDF. Sorry I will miss your presentation.
From: Fabian Pascal
My point was that whatever you mean, it cannot possibly make
sense. If something lacks any structure, it is not modeled or “modelable” by
definition; when you model you structure. Conceptual model of unstructured data
is a contradiction in terms. I'm afraid you don't have a proper understanding
of what a model is. Which is quite common.
From: Robert Seiner
You are entering the term "model" into the
equation. Why are you doing that???
From: Fabian Pascal
I am not entering anything. I am only referring to the title
of your paper, which is A Conceptual Model for Unstructured Data.
That is a contradiction in terms.
Even without the model, unstructured data is a contradiction in terms. If it is
unstructured, it is not data; and if it is data, it is structured. By
definition.
To reiterate: unstructured means completely unorganized in any way--that's
random noise and it cannot possibly be data. Data carries meaning and
the meaning is in the organization/structure. Period.
I do not know how to put the obvious more clear than this.
Posted
10/17/03
[ABOUT]
[QUOTES]
[LINKS]