Sunday, November 16, 2014

Natural, Programming and Data Language

William Sisson writes:
Thank you for posting the Dijkstra piece On the foolishness of natural language programming. It is a very interesting read.

I agree completely with Dijkstra that using natural language as a programming language is neither feasible nor desirable. It is not possible to pose precise questions in natural language and if you pose an imprecise question then you cannot expect a precise answer.
There is the separate question of whether it would be helpful to make formal languages more accessible in their syntax and terminology, not with the goal of making programming easier, but in order to ease the communication between the users of the system and those who define the formal definition. One might see the decision in RM to use named attributes rather than relying on the position of terms in the predicate as an example of this approach of familiarisation. If a user can look at a piece of the formal language and at least some idea of its meaning and how it relates to less formally expressed rules then the precise implementation of the required rules might be made easier.

Of course the person writing the formal definition must be fully aware of the imprecision of the natural language expression of the rule and know, for example, that logical “and” has a different meaning from “and” in natural language.

Obviously the goal of predicate logic was to remove all context specific elements that might distract from understanding the logical consistency of an argument from a purely formal point of view. So probably instead of calling customer “customer” we should use some abstract variable like x. However, from a practical point of view we want to make the interpretation of the formal language easier (more “user-friendly” if you like) and consequently we denote predicates, variables, names and functions with terms that are familiar to the user. (While of course acknowledging that how these terms may be used among the whole group of users will not be entirely consistent).
I see analysis as the process of taking informal statements made by users and gradually refining them into formal statements. I think making the formal language more readable can be an aid in this process, by making statements in the formal language at least to some extent understandable for the users.

Following Codd and David McGoveran, I distinguish between a programming and a data language, even if the latter is a component of the former. A programming language formulates algorithms (procedures), a data language describes data results. That's one of the several reasons Codd opted for a data sub-language, rather than a fully computational language. So the question for us, database professionals, is the adequacy of natural language for data languages.

Indeed, but communication not only between the formal definers--and as you well know, while there are many business modelers, there aren't many truly formal ones in the proper sense--and users, but also between users and the DBMS. Since the latter does not understand semantically, as users do, it views column names as X,Y,... regardless of what names we use anyway and that is also why the rest in predicates--verbs--is left out of the R-table representation. This is not only to prevent distraction, but for versatility, which is what formality confers: results depend only on form, and not on some specific context (meaning).

This does not relieve users, however, from from knowing and understanding the meaning of the data! This is, in fact, a core problem in industry practice, as I have recently demonstrated in my All Analytics blog. It's not so much that the data language is not very "natural" (although SQL is not a well designed data language), but that both modelers and users fail to be precise about, document and rely on table meanings (and ensure that they are well designed R-tables) to formulate queries and interpret results sensibly.

Indeed, that is the approach advocated in my paper #1, Business Modeling for Database Design. If professionals were exposed to a proper introduction to predicate logic and relational theory--and I stress proper to convey rigorous, yet pedagogically effective and practically relevant, a tall order extremely scarce these days--rather than almost exclusively to tools, it would, IMO, better address the problem than making data languages "natural".

That would also make modelers and users much less tolerant of poor implementations of the relational model, let alone non-relational alternatives, and incentivize vendors to offer TRDBMS's.


No comments:

Post a Comment