DATABASE DEBUNKINGS - 2000-01 WEEKLY QUOTES

2001 QUOTES


The weekly quotes contain errors that anybody with basic knowledge and understanding of data fundamentals in general, and the relational model in particular should be able to identify. The  purpose of the quotes is two-fold and educational (a) to demonstrate lack of foundation knowledge in the industry and alert practitioners to it (b) as a means to test oneself on grasp of data and relational fundamentals. The reader unable to recognize the errors is advised to pursue proper education, but exercise extreme care with respect to the quality of sources providing such (the site does provide some recommendations).



"Is there any RDBMS system generated unique identification numbers for all the records (like ROWID in Oracle) in MS SQLServer? Is there any physical location pointer which can be viewed using SQL statements like "Select ROWID from TABLE1;" in SQL Server?"

--http://searchdatabase.techtarget.com

10/20/01

"We are trying to figure out the best way to (re)design a database for our QC. department inspection record system. There is a seperate paper inspection record form for each part number that is inspected (over 10,000). This form contains the data for all the inspections done on this part over time. The form (inspection record) contain mostly all common fields except for a handfull of fields that are unique for each part (its paticular inspection requirements). Currently we are developing a flat file database with a seperate table (over 10,000) for each part (looks just like the paper form). The problem is trying to perform a query across all tables. Does anyone have any ideas on how to perform a query with this large table count or a better database design idea?"

--alt.comp.databases

10/13/01

"I have to design a database displaying brake data for every vehicle ever produced, by year, make, and model. How should I design the tables. Any tips will be greatly appreciated."

--http://searchdatabase.techtarget.com

10/06/01

"IT has a reputation for employing reasonably intelligent, objective, dispassionate people."

--Ray Jones, ZDNet

09/29/01

Q: I am interested to hear opinions on denormalising/consolidating tables because the data within them has similar shapes. For instance many entities may have an 'alternative name' table which has a[n] ENTITY_ID, NAME_TYPE column and a NAME column. Well, instead of having one of these tables per entity, why not merge all these tables together and add a column ENTITY IDENTIFIER which will tell you which entity the row belongs to. I can see problems with this due to hotspots, table sizes etc, but the advantages are you only have one table to maintain and write code for.

A: Well, a purist would have a stroke, but whatever works for your database, do it. This denormalization is sometimes done in the logical model, keeping a pure ER diagram with each of the entities separate, then in the logical model, merging them together, as you decide based on anticipated hot spots, etc. I do this very often with "code tables", where the table consists mainly of ID, ShortName, LongName, DateAdded and UserAdded. By adding a CodeType, I can have a generic table. Works very well.

--Online Exchange

09/22/01

"I need an explanation of Codd's 13 rules in human words cause I'm not sure I fully understand them. I also need to know if MS Access violates any of these rules. If so why?"

--Online Message

09/15/01

"I'm trying to create an online TV guide but am having difficulty getting my ERM right in relation to the time aspect of when programs start and finish. Assuming that a channel has many programs but a program only has one channel, can anyone help me in figuring the time aspect of this database. I am desperate to get this working."

--alt.comp.databases


"Cannot find anything to assist me in creating a report in Oracle.. I need to produce a report and write the SQL. Can anyone help. I just need the basic commands with a little explanation."

--http://searchdatabase.techtarget.com

09/08/01

"I use large arrays and I need to store them in tables. It appears easy to map array in tables, but I'm not sure to have the right answer to these questions: Is it better to store arrays (for instance 1D arrays) in one line or one columns of the table?

What are the benefits of storing in line or column? Databases accept only a limitated number of records (what about columns?)

My 1D arrays are large: 1,000,000 elements long and I've got several (20,000!!!). How could I improve performance in term of SQL requests?

I'm a database newbie and I'm very interested to know your point of view."

--comp.lang.java.databases

08/25/01

"Now we embark on the difficulty of creating a database, where it is nearly as easy to use as an OODB, and performs as well as an OODB (of course, depending on how it is used), has advantages of OODB's such as garbage collection and referential integrity, but the application behavior can be structurally independant of its data. IMO, this is the future of databases!"

--comp.databases.object

08/18/01


"Download this object-oriented, hierarchal, relational database for Windows and Pocket PCs."


--www.download.com

08/11/01

"I'm new to this so bear with my stupidity! I have a list of suppliers whose table structure is no problem. However, each supplier needs to have a table of products that they supply. Surely I don't need to create a products table for each supplier? Help! How do I solve this problem elegantly?"

--comp.databases.theory

08/04/01

"The more you drift away from any physical implementation, the more performance is going to suffer ... The choice is between best logical structure or best physical structure, or a compromise."

--comp.databases.object

07/28/01

"Mapping a conventional business onto a U2 database is straightforward and reflects the Post-Relational (PRDMBS) concepts put forward by Ted Codd in the mid-nineties."

--Ray Jones, ZDNet

07/21/01

"Reasons NOT to Use Foreign Keys constraints: There are so many problems with foreign key constraints that we don't know where to start:

Foreign key constraints make life very complicated, because the foreign key definitions must be stored in a database and implementing them would destroy the whole "nice approach" of using files that can be moved, copied, and removed.

* The speed impact is terrible for INSERT and UPDATE statements, and in this case almost all FOREIGN KEY constraint checks are useless because you usually insert records in the right tables in the right order, anyway.

* There is also a need to hold locks on many more tables when updating one table, because the side effects can cascade through the entire database. It's MUCH faster to delete records from one table first and subsequently delete them from the other tables.

* You can no longer restore a table by doing a full delete from the table and then restoring all records (from a new source or from a backup).

* If you use foreign key constraints you can't dump and restore tables unless you do so in a very specific order.

* It's very easy to do "allowed" circular definitions that make the tables impossible to re-create each table with a single create statement, even if the definition works and is usable.

* It's very easy to overlook FOREIGN KEY ... ON DELETE rules when one codes an application. It's not unusual that one loses a lot of important information just because a wrong or misused ON DELETE rule.

The only nice aspect of FOREIGN KEY is that it gives ODBC and some other client programs the ability to see how a table is connected and to use this to show connection diagrams and to help in building applications."

--Product Documentation

07/14/01

"The FOREIGN KEY syntax in MySQL exists only for compatibility with other SQL vendors' CREATE TABLE commands; it doesn't do anything. The FOREIGN KEY syntax without ON DELETE ... is mostly used for documentation purposes. Some ODBC applications may use this to produce automatic WHERE clauses, but this is usually easy to override. FOREIGN KEY is sometimes used as a constraint check, but this check is unnecessary in practice if rows are inserted into the tables in the right order. MySQL only supports these clauses because some applications require them to exist (regardless of whether or not they work). In MySQL, you can work around the problem of ON DELETE ... not being implemented by adding the appropriate DELETE statement to an application when you delete records from a table that has a foreign key. In practice this is as quick (in some cases quicker) and much more portable than using foreign keys."

--Product Designer

07/14/01

"Because relational databases structure data in rows and columns, it's difficult to express the relationship among different data records. XML databases lets data be structured hierarchically, thereby grouping documents that relate to one another ... Advocates of XML databases say relational databases work well for handling transactional data, while XML databases are better for data about multilayered processes that require context."

--Director of application development

07/07/01

NG: When developing object-oriented applications which store data on a database, it is quite normal to use a Relational Database Management System and not an Object Database Management System. When storing data on the database, a conversion between the Object Model and the Relational Model must be made. This conversion must preserve relationship among objects and map class attributes to table columns. But what happens to business rules? As far as I know, they are completely absent in database tables (except for referential integrity) and the database is seen merely as a storage medium. Things which are crucial to database integrity are normally left to the application's responsibility, e.g., specifying that a customer's birthdate should be prior to the current date. It is perfectly possible for a user (accidentaly or intentionally) with the right permissions to log on to the database and destroy database integrity. This raises the problem of whether business rules should also be in the database or only on the application. I'd be glad to read some opinions/experiences on this subject.

JC: The only rules that should reside in a database are referential integrity. (and sometimes that isn't really necessary) It is also best to keep rules out of your data access code. (hardcoding WHERE values.) Business rules should be centralized in Java business objects for better manageability, scalability, etc. Don't let pushy DBAs tell you otherwise. Rules in a database slow down development as well as data access time.

--http://www.theserverside.com

06/30/01

"I'm trying to decide whether to include 10 million+ lines of data in 1 MySQL table, or to split up each user (over 5,000 users) so that each user has his own table. Each user would then only have an average 2,000 lines of data in his table. My main concern in having just 1 table is that it's too busy (too many writes and reads going on). Anyone know the advantages/disadvantages of doing this?"

--Newsgroups: comp.databases

06/21/01

"I read with interest your article about XML data management. I am sure that you are not aware of the product from NeoCore. NeoCore has developed an XML database that has been designed from the ground up to support XML. It is both data- and document-centric hence we call it information-centric. The database system takes inputs and outputs well-formed XML. The biggest benefit of using the NeoCore system is that it totally obviates the need for doing any database design. The structure of the XML document describes the schema of the database. We index the whole document (tags and data) on the fly when we store the document."

--Raman Singh, Product Marketing Manager, Neocore

06/14/01

"Caché4.0 combines the speed and scalability of a transactional multidimensional data model with the power and flexibility of object technology, regardless of how the data is modeled. The system supports tables, objects, and even multidimensional arrays. Caché's proprietary Unified Data Architecture provides automatic access to all data in either object or table form, sidestepping the need to map from one format to the other. Better yet, because Caché stores its data in a multidimensional format, it assures exceptional performance even under heavy load."

--Tim Fielden, Caché binds data, Infoworld

06/07/01

"Is there a way to write an SQL that identifies which associate reports to which manager? Perhaps if there is a number I can call to explain what I am trying to achieve. Environment: PeopleSoft, Oracle, Access etc., any database."

--http://www.sqlteam.com

05/31/01

"I did take a quick look at your site, particularly the details on the ten [seminar topics]. My first impression is that the topics (and the expected level of technical depth) covered in these presentations is not likely to be what our members are used to. We are predominantly Sybase-DBA-centric, with over 90% of our members falling into this category. Also, over the past few years we have focused even more on detailed technical presentations - often given by Sybase engineers themselves - and the [material] on your site look quite generic."

--Database user group

05/24/01

"I need to make a search option into my web site. It needs to display products and the user can use any words. Right now I am using Access database with all my products in one table. I need help in connecting my database to the Internet. I know nothing about database, just creating queries and tables, that's it. I would appreciate if you would e-mail me back."

--Online message


"Please, I want a simple schema example for one distributed database."

--http://www.datamodel.org

05/17/01

I am designing a database system but I am a little unsure about normalization - could anyone tell me if the information below is in third normal form? If it isn't could anyone suggest where I've gone wrong.

Protocol (Protocol Ref., Title, Guideline Link, URL, Format, Category, Abstract)
Users (Protocol Ref., User No.)
User Profile (User No., First Name, Surname, E-mail, Home Hospital, NHS Trust, Position)
Comments (Protocol Ref., User No., Date Made, Main Body)
Dates (Protocol Ref., Date Created, Next Review, Last Review)
Author (Protocol Ref., First Name, Surname, Publisher)
Other Authors (Protocol Ref., First Name, Surname)
Watermark (Protocol Ref., Authority Score, Organisation)
Other Protocols (Protocol Ref., Title, Related Protocol Ref.)

where Protocol Ref. and User No. are key fields.

--comp.databases posting

05/10/01

Senior Data Modeler: The candidate will be working in the central Database support group. The candidate will meet with other departments to help design databases. The candidate needs to have an outgoing personality to be able to sell the department's technology to user departments. The candidate will be taking a high end look at business processes, look at the data and see how it relates to the business issues. The candidate must have some development background, an extrovert personality. Sales background would be helpful.

-Job ad

05/03/01

"I am working on a sports news web site. Currently we are trying to switch over to using an Oracle database to store our stories. Our table is STORY_TEST with multiple fields. One field is STORY_TEXT which contains the body of the news story. What I need to do is to display only the first sentence or first two paragraphs of that field, depending on the importance of the story. I have everything else working properly and would appreciate any help I could get."

--http://searchdatabase.techtarget.com

04/26/01

"I have come across many candidates for job profiles and really have no idea, other then software, what is the difference between a SQL DBA and Sybase DBA. I currently have SYBASE and SQL DBA positions available, but do not see any candidates out on the market and wondered what is trully required to be a SYBASE DBA or SQL DBA. If anyone out there can tell me it would be greatly appreciated, if you know someone who can explain it to me, have them reach me by my email."

--http://searchdatabase.techtarget.com

04/19/01

The most visible limitation of the relational model has been its inability to handle multimedia files, but the importance of this has been overstated. In fact, the relational model has some far more significant limitations that have not yet been challenged: * Every new relational application needs a new set of programs developed from scratch, which is labour-intensive, expensive and wasteful. * Relational applications cannot be readily tailored to the needs of large numbers of individual users, which is an issue for ASPs. * Relational applications cannot record a piece of information about an individual thing that is not relevant to every other thing of the same type. This limits our ability to continually improve customer service levels. * Information about identical things in the real world is structured differently in every relational database, so it is difficult and expensive to amalgamate two databases.

--Simon Williams, The Associative Model of Data

04/12/01

"A traditional normalized structure cannot and will not outperform a denormalized star schema, from a DSS perspective. These schemas are designed for speed, not your typical record style transaction. I have seen and been involved in too many large RDBM implementations that can absolutely not support the stress that a denormalized structure can handle. Supporting users increasing needs of accessing and analyzing information from a normalized schema perspective does not make sense. Normalized models cannot support ad-hoc users needs to extract large volumes of records, aggregating the facts and create hierarchies on the fly. These type of requests puts way to much stress on a model that was originally developed to support the handling of single record transactions."

--Practitioner "with 20 years experience"

03/29/00

"Object-oriented systems can be classified into two main categories--systems supporting the notion of class and those supporting the notion of type ... Although there are no clear lines of demarcation between them, the two concepts are fundamentally different."

--Bertino and Martino, OBJECT ORIENTED DATABASE SYSTEMS

03/22/01

"We designed the data model for an application that is being developed using RUP (Rational Unified Process). The project was outsourced to a vendor but it was decided that the data model would be developed in house. One of the project deliverables is to do an alignment of the data model with the object model. Our dilemna has been the level of details projected in the two models. While the object model is at a very high and conceptual level the data model is at a very detailed level. I would appreciate any input on (1). how to map the Object Model to the Data Model (2). At what point does one stop comparing the models (attribute vs entities)."

--http://www.datamodel.org

03/14/01

"I have the following table:
______________________________
BUSNO ROUTE RUN STOP TIME CODE
______________________________
11 A 1 Ack Rd 9.37
11 A 1 Lin Rd 9.42
11 A 1 Alb Rd 9.50
21 B 2 Col Rd 9.25
21 B 2 Alb Rd 9.30
24 C 1 Ack Rd 9.30
24 C 2 Ack Rd 9.40
______________________________

It is for a bus timetable database project.. I cannot see any functional dependencies in it, as everything seems to be determined by BUSNO, am I right? Is there a way of changing it so I can use normalisation techniques?"

--Online question

03/07/01

Q: I am interested to hear opinions on denormalizing/consolidating tables because the data within them has similar shapes. For instance many entities may have an 'alternative name' table which has a entity_id, name_type column and a name column. Well, instead of having one of these tables per entity, why not merge all these tables together and add a column 'entity_identifer' which will tell you which entity the row belongs to. I can see problems with this due to hot spots, table sizes etc, but the advantages are you only have one table to maintain and write code for.

A: Well, a purist would have a stroke, but whatever works for your database, do it. This denormalization is sometimes done in the logical model, keeping a pure ER diagram with each of the entities separate, then in the logical model, merging them together, as you decide based on anticipated hot spots, etc. I do this very often with "code tables", where the table consists mainly of ID, ShortName, LongName, DateAdded and UserAdded. By adding a CodeType, I can have a generic table. Works very well.

--http://www.datamodel.org

02/28/01

"How do you program a main menu in SQL? This is for a project using an Oracle database."

--http://www.deja.com

"Under what conditions would you model a report as an entity? When wouldn't you?"

--http://www.datamodel.org

02/21/01

Q: I have two tables, FIXTURE and TEAM. A team can have many fixtures and a fixture can have many teams (actually there is two always). I see little reason to make an intermediate table. Can anyone help me?

A: I don't think you have a true many-to-many because, as you said, a fixture will always only have two teams involved. In this case, you don't need an intersection entity to resolve a M:M. What I would do, is relate team to fixture twice each relationship being mandatory. Therefore, when it is physically implemented you would have two columns on the fixture "table", one for each team involved in the fixture event. You could also implement constraints that would ensure you couldn't select the same team for both relationships. Make sense? Typically you would create the intersection entity if there was the slightest of possibilities that it could be more than 2 but for fixtures obviously you don't need to. Does this help?

--http://www.datamodel.org

02/14/01

"I am new to SQL and need to program code to set domains and cardinality for mulitiple tables. Any help would be appreciated!"

--http://www.tek-tips.com/

02/07/01

I would like to know how to normalize a 1NF to 2NF...especially when there are lots of NULL values in the fields.

--http://www.datamodel.org

01/30/01

"I think you mean the only free relational database - which is not the same thing at all. There are much better databases out there. While I would strongly suggest that all database programmers should know relational theory (it helps design immensely), there are a load of far better databases out there. SQL and relational databases put theoretical purity above practicality and functionality, which is why Oracle is such a beast - I could probably write programs that run faster, do more, and handle larger datasets, and all on a system half the size just because I don't believe "relational is best".

--Newsgroup posting

01/21/01

Q: After normalization of my relations I´ve got a lot of foreign keys (FK) in. MySQL was my favorite choice until I read that MySQL doesn't support FKs, it parses the string when you create a table, but ignores it. How does this work ? Should I use Postgresql? I'm confused .

A: If you want to have a database that handles FKs, do not use MySQL.

Q: Okay, if foreign keys can't be used in MySQL, how do you simulate the same functions? Perhaps I don't understand what foreign keys are used for, since as far as I can tell, all manipulation I can think of can be handled with the proper SELECT statements (inner joins). As a side note, what is the use of creating indices for a table? As far as I can tell, SELECT can - if written carefully - do all the necessary cross-referencing, right?

--Online exchange

01/14/01

Q: I have no IT experience, am changing jobs after 20 years, which certification should I take for DBA and will the certification without experience be accepted by workplace? Thanks.

A: ... First, you need to decide among the top systems: Oracle, DB2, SQL Server, Informix, Sybase, etc. Then you'll need to take a training course (thousands of dollars) or read a hell of a lot of books. The cheapest way to proceed is to download, for example, Oracle 8i from the technet.oracle.com site, install it, then start playing with it. Read the reams of free Oracle info on the web, and get your hands dirty. That's the best way to determine if this career is right for you! Good luck.

--Online exchange

01/07/01

"Data components are much more intuitive to work with than rectangular database rows. It is much more natural to call 'employee.getPassword()' than to deal with a relational result set." ... business process components must operate on rectangular rows, rather than on data components, which is highly non-intuitive. Persistence is the storage of enterprise data in a durable data store, such as a database. Any serious commerce application requires persistence ... [A] persistence service should provide ... the ability to override the server-side platform's persistence engine and provide custom persistence routines if necessary. As the database is usually the bottleneck in a commerce deployment, it is desirable to avoid accessing the database whenever possible."

--Newsgroup posting

12/19/00

"I'm working on a project where I need to create a database with 3 different sources of information from EDI data, FTP'd data from someone else's server, and price information from MANMAN (a data system running on Unix). Wouldn't relational databases [sic] be my best bet? I intend to do reporting and lookups against this database. Is there middleware out there that would allow me to extract data from our Unix product and allow me to create/update tables for the other 2 pieces of info?? I'm at a lost as to where to start and what to start with, please help. Thanks."

--Online question

02/11/00

"I have been trying to find the correct way of normalizing tables in Access. From what I understand, it goes from the 1st normal form, to 2nd, then 3rd. Usually, that's as far as it goes, but sometimes to the 5th and 6th. Then, there's also the Cobb 3rd. This all makes sense to me. I am supposed to teach a class in this starting next week, and I just got the textbook. It says something entirely different. It says 2nd normal form is only for tables with a multiple-field primary key. 3rd normal form is only for tables with a single-field key. 4th normal form can go from 1st to 4th, where there are no independent one-to-many relationships between primary key and non-key fields. Can someone clear this up for me please?"

--Online question

11/29/00

A: How is data independence implemented?

ML: What do you mean by 'data independence'?

A: Data independence is the independence between levels in a DBMS, when you can modify a level without interference in another or in your programs.

C: Can you be more specific? I have not heard of this concept in relation to how you have described it. It sounds more like an application design concept rather than an SQL principle.

MW: C's right, it is a design approach. There is [sic] both physical and logical data independence, but there is no way to add data independence to a SQL statement.

--Online exchange

11/22/00

"XML avoids the fundamental question of what we should do, by focusing entirely on how we should do it."

--http://www.metatorial.com

11/13/00

"We are designing a business-to-business application with shoppingcarts, orders, ordelines and other stuff with servlets and JSP.

The java part is clear. But the database... This is the simplified article with a name and a quantity that is available (stock)


----------
ARTICLE
======
id
name
quantity
----------

1. Somebody administrates the stock: he takes things out of the store and adapts the quantity. 2. Somebody gets orders for production, produces it [sic], and then modifies the available quantity. 3. The customer sees the quantity that is in stock, but also how many thingies are in production (he can order articles, but can also place orders for production. When they request more productions, that quantity is not immediatly available in stock. 4. Also a history of productions must be kept.

Pff, we are programmers not databasedesigners [sic]. Can anybody give us some help?"

--Database newsgroup posting

11/01/00

"Work with business users as a Data Modeler/Data Analyst using modeling tools such as: Cayenne, Bachman, or Sterling forward engineering from logical to relational and reverse engineering from existing physical databases, file schemas, IMS.

Candidates must have superior communication skills and have worked on at least 2 major projects (more than 50 entities)."

--Job ad

08/20/00

"XML also doesn't work well with SQL databases, according to Wilke. "XML and SQL are 180 degrees different," he says. "XML has a structure that supports all types, navigation and search, all these things you can't do in SQL. We see a lot of vendors trying to put an XML layer on top of their data products, and often it doesn't work."

Helping to match SQL data to XML and vice versa is still a manual task, because XML is relational data and SQL is flat-file format. There's no easy way to pull information on a single purchase if it's stored in three tables in an SQL database. That has to be done by hand."

--Andy Patrizio, InformationWEEK

08/08/00

"Why We Need XML Server Technology? So we can see the dilemma that RDBMS's are facing. Either the whole XML document is stored in one piece - meaning it is not possible to index individual elements. Or the document is decomposed into single elements which are indexed separately. Retrieval operations must then read all elements individually and join them together – an expensive operation. The RDBMS has to continually put the jigsaw-puzzle together. In addition, protocol rules and triggers are required to maintain the integrity of the object. Locking is another issue. Whilst a document is being updated, it must be locked to prevent other users from accessing it. In an RDBMS where a document is split into many pieces, each single piece is locked separately."

--Software AG, UK

07/25/00

"Currently I started working on [a] project which [has] a client-server application. My concern though is [with] the database. Right now tables are created exactly as a tree which reflects all the objects in my Java application. Can someone tell if I should remodel the database, or there is some other way to work around this?"

--Online message

07/07/00

"... I have a[n] object with a lot of member attributes, say 50 of them. If these attributes do not really stand alone and values are not shared by different objects, I do not really have a case to normalize them out when disigning tables for this object. The table can potentially grow very big, and I will need to search throw [sic] all the records to retrieve relevant object info. Which one of the following is a better choice, that gives me most flexibility and good search performance.

1. Since I do not have a case for normalization, I keep everything in the same table. A table of 50 columns, uniquely identified by a key. I search through this table when I need to, without any join.

2. I split the big tables into portions, shared the same unique key. The argument is that the tables get smaller, then you get better performance when searching on a perticular column, and smaller tables are more flexible and easier to maintain. But I am worried about the join that will have to take place when searching for multiple values, which might kill the performance.

--Online message

06/19/00

"I need to store 40 pieces of unrelated information. Is it better to create [one] table [with one] record [and] 40 fields, or create [one] table [with] 40 records [and one] field?"

--Online message

06/04/00


The weekly quotes contain errors that anybody with basic knowledge and understanding of data fundamentals in general, and the relational model in particular should be able to identify. The  purpose of the quotes is two-fold and educational (a) to demonstrate lack of foundation knowledge in the industry and alert practitioners to it (b) as a means to test oneself on grasp of data and relational fundamentals. The reader unable to recognize the errors is advised to pursue proper education, but exercise extreme care with respect to the quality of sources providing such (the site does provide some recommendations).


[2002]  [2003]  [2004]  [2005]  [2006