Friday, March 18, 2022

SMS: Entities and Records


Note: "Setting Matters Straight" is a new format: I post on LinkedIn an online Q&A involving data fundamentals to encourage readers to test their foundation knowledge, which they can then compare with our debunking here, where we confirm what is correct and correct what is fallacious (with clarifications, wherever necessary). For in-depth treatment check out the POSTS and our PAPERS, LINKS and BOOKS (or organize one of our on-site/online SEMINARS, which can be customized to specific needs).

Q: “What is the relationship between an entity and a record?”

A: “In the context of a database design, an ‘entity’ is a type or category of persons, places, things or events. It’s a collectivisation of the nouns in a system about which you wish to keep data. For example, Employee might be the name of an entity in your system. A ‘record’ is a collection of data about a specific entity, a particular person or place, an identifiable thing, or a single event. For example, Name: ‘Dave Voorhis’, StartYear: 2019, Salary: £1,398,293 might be a record of one Employee entity in your system.”

A: “Database, file, and recordset are basically the same thing. They are collections of information or data. Each database or file or recordset typically has some sort of common purpose or definition. Like a database (relational, hierarchical, etc.) of data of a business process. A File is again a collection of data such as all transactions to be posted. A recordset is also basically a file.

Entity and table are basically the same thing. While you have the grouping of all the data, and entity (logical view) and a table (physical view) are the same. As Dave said, it is a logical grouping of a specific piece of data.

File, recordset, record, row or line are basically the same. A .csv file is a grouping of records. A file is a grouping of records. A row is an individual grouping of data from a relational database.

The last is element or attribute or field. This is the individual piece of data like Transaction_Amount or First Name.”
A simple and the answer oversimplifies. But things seem simple only in the absence of foundation knowledge. Practitioners use different terms for the same thing, or the same word for different things, but that must be corrected, not accepted or validated.


DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.


03/06 TYFK: Relationships -- Uniqueness and Attribute Constraints

02/19 OBG No Understanding without Foundation Knowledge Part 6: Debunking an Online Exchange 5

02/04 OBG No Understanding without Foundation Knowledge Part 5: Debunking an Online Exchange 4

01/30 TYFK Nobody Understands What a Data Model Is

01/21 Read My Lips: If There's NULLs, It's Not Relational

- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my AllAnalytics columns no longer work. I re-published only the 2017 columns @dbdebunk, and within them links to sources external to AllAnalytics may or may not work.

I deleted my Facebook account. You can follow me @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.


A database is (1) a computable formal representation at the logical level of a conceptual model of reality (2) implemented physically in hardware. The model is the meaning (semantics) of the data in the database. Culture and language induce a view of reality as consisting of three primitive concepts -- objects, properties and relationships. In conceptual modeling for database design they are as follows:

  • Objects:

- entities
- entity groups
- multigroup.

  • Properties:

- of entities (individual)
- of groups and multigroup (collective).

  • Relationships:

- intragroup: among properties and entities
- intergroup: among entities and aggregates

Logical database design is the use of a formal data model to formalize conceptual models of reality as logical models for database representation. Using the RDM, a database formalizes as a multigroup -- a collection of related groups:

Groups formalize as database relations:

  • entity properties -> attributes (defined on domains)
  • entities -> tuples
  • relationships --> constraints

Note very carefully:

  • Intragroup relationships among entities are collective properties of the group;
  • Intergroup relationships among groups are collective properties of the multigroup.

There is a 1:1 correspondence between the elements of a conceptual model and those of a logical database because the latter represent the former. This may, but does not have to be the case for logical and physical elements because of RDM's support of physical independence: implementers are free to use physical elements -- storage and access methods (files, fields, records, indexes, hashes, clustering) to optimize performance orthogonally to logical design. Unfortunately, SQL DBMSs are not truly relational and have weak support of physical independence.

Setting Matters Straight

An entity is the most primitive object in a conceptual model that, indeed, is something uniquely identifiable of interest to be recorded in the database. An entity type is essentially a collection of properties observed to repeatedly occur together and given a name for convenience. A record is a stored set of data values. It is "about an entity" if and only if it is a 1:1 implementation of a tuple of either  a base relation or (2) of a derived relation in a true relational system that supports multi-relation results and a relational algebra with 5NF closure (in which all relations are in 5NF). Otherwise otherrwise, it may be "about multiple or partial entities", as is the case in SQL systems (why?).

Files and records are physical means to implement a logical databases -- there can be multiple files per database (even multiple databases per file are possible in theory, but rare in practice). A file is a set of records, but a set of records can be part of one or more files.

As we have seen, an entity is an element of the conceptual model. A R-table is a visualization of a base database relation which represents a group of entities of a type, a row of which visualizes a tuple that represents an entity at the logical level. Visualization should not be confused with implementation: displaying a relation as a R-table on some physical medium (paper, screen) does not necessarily mean that the data is stored as a table.

An attribute is a formal representation at the logical level of a property of entities of some specific type in the conceptual model. There is no "individual piece of data" in RDM: a data value a tuple value that is a valid attribute value and represents a property value of entities of a type. 

A field is an application, not database term and it commonly refers to the visualization of an attribute value on some physical medium (data entry form).


If logical database design is formalization of conceptual models and physical implementation is representation of logical models in hardware, it should be obvious why our claim that conflation/confusion of levels of representation amounts to lack of understanding of what database management is is not an exaggeration. That is why, to help preempt it, we insist on using a three-fold terminology: conceptual modeling, logical database design and physical implementation.





No comments:

Post a Comment

View My Stats