MORE ON “UNSTRUCTURED” THINKING
with Fabian Pascal

 

 

 

From: Fabian Pascal

To: Ken North, Robert Seiner

Date: 10/27/2003

 

I strongly suggest you stop using the term “unstructured data”, which is a contradiction in terms. Do not reinforce fallacies.

See Unstructured Thinking.

 

 

From: Ken North

To: Fabian Pascal

Not always. Attached is an example of unstructured data (a document fragment).

 

From Unstructured Thinking:

 

But, of course, that is inaccurate: the data is organized in accordance with a "contract text data model", so to speak--words, paragraphs, clauses, sections, and so on, with its own constraints and operators. These would have to be implemented in the system that manages the data, such that it can protect its integrity and derive information from it.

 

Again, not always. The fragment above is one of a collection of scanned pages.

The problem in this case is incomplete content (missing pages). Different people have filed away documents, diagrams, and partial documents over a 40-year period.

 

 

From: Fabian Pascal

 

Always. There is no such thing as "unstructured data". That means random noise, which has no structure whatsoever and, therefore, is meaningless. It is the structure that gives meaning/content and makes data.

It has nothing to do with scanning, or incompleteness, or missing, or anything. It is structured, whatever it is. Diagrams have one type of structure, partial documents different types of structure, but there is always some structure by definition.

The term "unstructured data" is a misnomer based on misconception: it essentially refers to data that is not structured in tables, or spreadsheets, or whatever; mainly text, graphics, etc. But that is not unstructured, it's just different structures than tables or spreadsheets, that's all.

And that's a core issue, because structure determines the integrity and manipulation of the data, which are different for each type of structure. The point of relational structure is that it is the simplest formal structure for integrity and manipulation. Any other structure adds complexity, but no power.

Now, if you can't or won't structure it in tables, you must accept the complexity, but that's different than saying that you can avoid relational structure and yet get the same informational value out of the data. And it certainly does not mean unstructured.

Learn the fundamentals.

 

 

From: Ken North

 

You're saying that, by definition, data has structure.

 

So what's your term for factual information that lacks structure, but is used for reasoning or discussion?

 

 

From: Fabian Pascal

 

Ken, Ken, c'mon, pay attention, I already answered that: It is structured data, but just not relationally structured data, or not spreadsheet-structured data, or what have you.

Which is its precise problem: it's not the kind of structure that lends itself to the formal simplicity of relational manipulation and integrity. To get that kind of benefit you got to model relationally--that is, structure it in R-tables. You can choose not to do that, and you may have reasons for it, but then you buy into more complexity for less informational value. That last part you don't hear anybody talking the unstructured nonsense say.

The industry wants the cake and to eat it too. It's fooling itself that it can have XML databases "without doing db design" (read: without relational structure) and get out of them the same or more of what they would get with RDBMSs (even SQL DBMSs). And that's due mainly to ignorance.

 

 

From: Robert Seiner

To: Fabian Pascal

 

Do I understand you correctly that I should stop using the term "unstructured data"? I just returned from a client after spending part of the day discussing "unstructured data".  I do not think I will stop using the term but I will read your article none-the-less, and perhaps, then we can talk about it.

 

 

From: Fabian Pascal

 

Yes, I mean you should stop, because it is nonsense. You can do the same work without using contradictions in terms. Read what I told Ken [above].

 

A model is defined as structure and behavior (manipulation, which includes integrity).
Structure is what is being manipulated.

Can you tell me what does [the title to your article] A Conceptual Model for Unstructured Data means? Ponder that.

Are you going to EDF in Cherry Hill? If so, why don't you attend my session, I cover this very topic.

 

 

From: Robert Seiner

 

Read my articles on that subject and you will know what I mean.  I give my very simple definitions of structured and unstructured data.  I will not be at EDF.  Sorry I will miss your presentation.

 

 

From: Fabian Pascal

 

My point was that whatever you mean, it cannot possibly make sense. If something lacks any structure, it is not modeled or “modelable” by definition; when you model you structure. Conceptual model of unstructured data is a contradiction in terms. I'm afraid you don't have a proper understanding of what a model is. Which is quite common.

 

 

From: Robert Seiner

 

You are entering the term "model" into the equation.  Why are you doing that???

 

 

From: Fabian Pascal

 

I am not entering anything. I am only referring to the title of your paper, which is A Conceptual Model for Unstructured Data. That is a contradiction in terms.

Even without the model, unstructured data is a contradiction in terms. If it is unstructured, it is not data; and if it is data, it is structured. By definition.

To reiterate: unstructured means completely unorganized in any way--that's random noise and it cannot possibly be data. Data carries meaning and the meaning is in the organization/structure. Period.

I do not know how to put the obvious more clear than this.

 

 

Posted 10/17/03

 

 

 

[ABOUT] [QUOTES] [LINKS]