Sunday, February 14, 2016

Healthcare, Data Fundamentals and the PASS Summit (UPDATED)

When, years ago in an online exchange, I argued that working with SQL DBMS's without knowledge and understanding of data and relational fundamentals is a costly proposition, an Oracle practitioner replied that "they train doctors on how to use medical devices, not teach them the theories behind them". I asked him what do doctors learn in their six years of medical school, but got no reply.

I have documented and debunked for decades the substitution of tool training for education and the ensuing "cookbook approach" to database practice it produces. While I have become more jaded, it is still difficult to run so frequently across something like

Foreign keys are wonderful. Put as many of them in every table. They improve performance and lead to conformance with the current leading database concepts:
Foreign Key Relationship Limits - A table can reference a maximum of 253 other tables and columns as foreign keys (outgoing references). SQL Server 2016 Community Technology Preview 3.2 (CTP 3.2) increases the limit for the number of other table and columns that can reference columns in a single table (incoming references), from 253 to 10,000.
--Israel SQL Server User Group
Decades ago professional conferences, while including exhibitions dedicated to vendors and products, had at least sessions by speakers who were mainly independent experts and mostly product-agnostic. Over a period of time, however, vendors gradually took them over and, ultimately, replaced them with their own conferences altogether. Today, they consist of "sound-byte" sessions of 30-60 minutes about the latest buzzword or fad by either vendor personnel, or consultants closely associated with the product. Not only are these too short to confer any serious knowledge, they are also entirely devoid of fundamentals and few are much more than marketing vehicles.

Take the PASS Summit, for example, the top Microsoft conference for SQL Server DBA's, developers and users. Here are the criteria for one-day pre-cons:
1. You have presented at least one pre-con at a conference such as PASS Summit, SQLRally, SQLSaturday, TechEd/Ignite, PDC, DevTeach/SQLTeach, SQLBits, SQL Intersections, or SQL Connections
and at least five of the following:
2. You have been, or are currently, an MCT (Microsoft Certified Trainer).
3. You have been, or are currently, a SQL Server trainer (non-MCT) and have taught multi-day SQL Server training sessions for a training or consulting company.
4. You have taught at least one college-level, credit-based class at an accredited college or university.
5. You have presented at least eight formal SQL Server-related presentations (1 hour or more) during the past 2 years.
6. You are a Microsoft MVP or RD.
7. You are, or have been, a Microsoft employee specializing in SQL Server.
8. You have been working primarily with SQL Server for at least 5 years.
9. You have attained one or more Microsoft SQL Server certifications: MCSE, MCTS, or MCITP.
10. You have attained the Microsoft MCM/MCSM and/or MCA certification.
Now, it's natural and there is nothing wrong with requiring speakers to demand product expertise at a SQL Server conference. But is it sensible to make it exclusive? Practically none out of the hundreds of sessions is dedicated to the very foundation claimed for the product

Consider the following, by the rare product expert who knows something about and respects the RDM:
SQL, in turn, is based mainly on the relational model—a semantic model representing data that was created by Edgar F. Codd in 1969 ... The relational model defines an important principle called the “physical data independence.” What it means is that the model and the language based on it define the logical aspects of the data and its manipulation--in other words, the meaning ... A good example for a violation (meaning, incorrect expectations) of the physical data independence principle is when people query a table without an ORDER BY clause and assume that the data will be returned in clustered index order ... A similar violation of the principle is when people update data and the solution’s correctness relies on the data being updated in clustered index order (do a Web search on “quirky update” to see what I mean). --Logical Query Processing and What It Means To You
First, "based mainly on the RDM" glosses over the very important fact that SQL DBMS's are not RDBMSs. SQL practitioners confuse SQL with the RDM and are unaware of the practical implications thereof, particularly that both product implementation and their practices fail to guarantee two of the most critical benefits of the RDM: logical and semantic correctness. How many conference speakers, let alone attendees know and understand this? (incidentally, did you know that the latter requires 5NF relations?)

Second, RDM's reliance on first order predicate logic (FOPL) allows a true RDBMS--though not a SQL DBMS!--to guarantee correctness of query results independent of the meaning of the data, which FOPL symbolizes away. That's what gives logic and, therefore, the RDM, versatility. In other words, logical means "without reference to meaning"Meaning come into play only in the interpretation of results.

Third, user expectations of correctness based on physical features
reflects the very lack of relational knowledge that have been deploring for decades, but are not not violations of physical data independence (PDI). PDI is the insulation by the DBMS of users and applications from physical implementation details. A DBMS violates PDI when it exposes storage and access methods to users and applications, conditioning logical features on them. One example is SQL DBMS's allowing duplicates, making access to them dependent on physical features such as addresses in storage e.g., ROWID in Oracle; another is keys (logical) dependent on indexes (physical): dropping the index deletes the key (don't get me wrong: I am not saying that keys should not be indexed for performance purposes, only that their existence should not depend on it).

The author has all the speaker qualifications and then some and does not have the common disregard for the RDM. And yet...

David McGoveran is
(1) the person who first insisted to the  CEO of Sybase, the developer of the initial SQL Server, that he should provide a PC version
(2) the first person to port any SQL Server code to the PC
(3) the first person to write an in depth book on SQL Server
(4) provided the early Microsoft team with consulting on its version
When I passed him a draft of this post for comments, he found the speaker requirements "bizarre". He meets none (neither would Codd or Date for that matter; I meet only 1, 4 and 5) And PASS Summit is hardly unique.

As to medicine, it seems to be going the way the Oracle practitioner described (
note the job requirements):

The goal of the Health Content Researcher is to decode complex, confusing medical terms and organize them into a coherent structure, which will ultimately help patients in their search for healthcare providers.

The Content Researcher will accomplish this task by researching, labeling, and categorizing medical procedures as well as other health-related concepts. Through web research, the successful candidate will develop a clear understanding of each concept and its nuances in order to write a concise yet apt label to describe it. In constructing labels, the Content Researcher will need to convey how they relate to similar concepts and keep track of synonyms (like “CABG” for “heart bypass surgery”).

In addition to creating labels, the Content Researcher will categorize procedures into a taxonomy (way of grouping related concepts) according to how they are classified in the medical field. Taxonomies are important tools in information architecture, and they are integral to our doctor search service.

Sunnyvale, CA (near Mathilda Ave and El Camino Real)

$3,000 per month for full-time work (40 hrs/wk). The company also offers a stipend for benefits (including health insurance). Compensation is not open to negotiation.

This position is available immediately.

Legal right to work within the US
No prior experience necessary


  • Research and review source material on health conditions, procedures, and other health-related concepts
  • Synthesize incomplete and sometimes contradictory health information from around the web
  • Develop a clear understanding of each health-related concept and its nuances
  • Craft concise, specific labels to describe each concept
  • Keep track of related concepts and synonyms (e.g. "CABG" for "heart bypass surgery")
  • Construct taxonomies for concepts according to how they are classified in the healthcare field

One of the hottest trends in healthcare is Big Data, data science and machine learning. Healthcare startups are sprouting like mushrooms and are full of software engineers, coders and data scientists--the only expertise missing is medical. Young generations of doctors may know all about EMR's and the latest medical devices, but their diagnosis and treatment skills leave much to be desired.

See also  I'd like to submit a precon for the @sqlpass...


No comments:

Post a Comment

View My Stats