Entity Relationship Diagram

Miscellaneous

James V. Luisi , in Pragmatic Enterprise Architecture, 2014

8.1.2 Nonsensical Buzzwords

As with any taxonomy, there are bound to be terms that emerge that misrepresent information and mislead others; most of them are accidental from individuals that see one side of an issue, but lack the experience to have encountered the other sides of the issue that help put it in perspective. We see this in books and commonly on the Web.

While it is neither practical nor possible for that matter to identify them all, we will share one of our favorites.

My recommendation is to ask lots of questions to learn what any buzzword means. If you don't get an explanation that makes it crystal clear, keep asking questions. The others in the room probably have no idea what the buzzword means either.

8.1.2.1 Object Relational Impedance Mismatch

Entity relationship diagrams were in use nearly a decade before IBM announced their first relational database management system. Entity relationship diagrams were routinely used with:

"hierarchical databases" (e.g., DL/I and IMS),

"inverted list databases" (e.g., Adabas), and

"network databases" (e.g., IDMS).

The point here is that entity relationship diagrams provide a basic method with which to depict collections of data points and the relationships that those collections have with one another.

It is therefore amusing how often one can find statements on the Web, including in Wikipedia, that state that entity relationship diagrams can only depict designs for relational databases. But that said, it gets even better.

Now that the reader is now knowledgeable in many of the differences between information systems and control systems, it is easy to understand how object-oriented paradigms originated with control systems, and then became adapted to information systems.

Yes, the architectural foundation between the two paradigms is different, but that's only because there are no tangible things that you can consistently touch in an information system.

Stable collections of data points within information systems are "objects" around which the application may be architected, as with collections of data that are identified within a logical data architecture. This goes down to the smallest collection of data points for which there are many synonyms, which include:

record,

tuple,

entity,

table, and

object.

A conceptual or logical data model, as represented by an entity relationship diagram, is suited to model the data, regardless of what anyone decides to call the collections of data points. In other words, relational has nothing to do with it.

Now that there are a few generations of developers that only know object-oriented and relational, who have seen the differences between object-oriented control systems and relational database-oriented information systems, they have coined a new term called, "object relational impedance mismatch."

The following are examples of what has been used as justification for the existence of "object relational impedance mismatch."

Encapsulation: Object-oriented programming languages (e.g., Ada) use concepts to hide functionality and its associated data into the architecture of the application. However, this reason speaks to application, not database architecture.

Accessibility: Public data versus private data, as determined by the architecture of the application, are introduced as additional metadata, which are impertinent to data models.

Interfaces: Objects are said to have interfaces, which simply confuses interfaces that exist between modules of applications with data objects.

Data type differences: Object-oriented databases support the use of pointers, whereas relational does not. From the perspective of database management system architectures, the architectures that support pointers include hierarchical, network, and object oriented, whereas inverted list, relational, and columnar do not.

Structural and integrity differences: Objects can be composed of other objects. Entity relationship diagrams support this construct as well.

Transactional differences: The scope of a transaction as a unit of work varies greatly with that of relational transactions. This is simply an observation of one of the differences between control systems and information systems. What does "transaction" even mean when you are flying a B2 Stealth bomber, and if the transaction rolls back, does that make the plane land backwards where it took off from?

Okay, I can predict the e-mails that I am going to receive on this last one, but you have to inject a little fun into everything you do.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128002056000081

Designing and Creating SQL Server Databases

In Designing SQL Server 2000 Databases, 2001

Entity-Relationship Diagrams

Entity-relationship (ER) diagrams are the blueprints for database applications in OLTP systems. The process of creating ER diagrams is well documented and involves:

Identifying database entities (tables)

Defining entity attributes (columns)

Identifying unique row identifiers (keys)

Defining relationships between entities

The data model then goes through a process called normalization that includes three primary rules for efficient data storage. For a detailed description of the normalization process, refer to the "Database Normalization" sidebar in this chapter.

Database Normalization

Designing your database model is dependent on how your database will be used. OLTP systems are designed around a relatively standardized process called normalization. After you have completed the tasks of entity discovery or identifying the logical data entities in your system, the normalization rules provide guidelines for fine-tuning your data model to optimize performance, maintenance, and querying capabilities. Having said that, complete normalization is not always the best solution for your database. OLAP systems and some OLTP application requirements often result in a denormalized database or at least a denormalized segment. In OLAP solutions that typically contain mass amounts of historical data, the denormalized structure, including multiple copies of data and derived columns, can significantly increase analysis performance and justify its violation of normalization rules. The choice of complete normalization is always dependent on how your database will be used:

Normalization is a process of organizing the tables in a database into efficient, logical structures in order to eliminate redundant data and increase integrity. The physical results of normalizing a database are a greater number of smaller tables that are related to each other. Although there are up to seven normalization rules, called forms, the first three forms of normalization are the most significant and commonly used. The remaining normal forms are primarily academic. The primary normal forms are:

First normal form (1NF) Eliminate repeating groups and nonatomic attributes (or fields that contain multiple values).

Second normal form (2NF) Eliminate partial dependencies.

Third normal form (3NF) Eliminate nonkey dependencies and derived columns.

In order for the tables in your database to comply with the 1NF:

They must have no repeating groups.

Each field must be atomic (contains no multivalue data).

So what does that mean? The easiest way to understand this concept is to take a look at an example of a table that needs to be normalized. Table 4.5 is an un normalized table representing cities.

Table 4.5. An Unnormalized City Table

State (Key) Governor City (Key) Founded Years Old Founders Suburb1 Suburb2
NY Pataki Roberton 12/28/1941 59 Erwin, Patton Michaelville Alexopolis
NY Pataki Willville 8/24/1932 68 DeWolf Auburn

After you inspect the table for a moment, you will see that Suburb1 and Suburb2 are a repeating group. One of the reasons that repeating groups are troublesome is that they restrict how far you can extend your database to include all related information. After all, what would you do with a third suburb in this case? Another problem is that unnormalized database designs waste space. In this example, cities that have no or one suburb will not fully utilize the allocated space for the row. Finally, these designs make searching and sorting cities by their suburbs difficult. To eliminate the repeating group, you need to create a new entity called Suburb and form a relationship between the two entities.

There is another problem with the City table: Founders is not atomic, because it contains more than one value that can be split. Again, this design will prevent you being able to sort or search on the data effectively. We could try to resolve this problem by splitting Founders into Founder1 and Founder2, However, this would create a repeating group like the one we had with Suburb1 and Suburb2. Once again, the solution is to add a new entity that represents the founders related to a city. After we add the two entities, we will comply with the 1NF and our tables should look like Tables 4.6–4.8.

Table 4.6. A Revised City Table

State (Key) Governor City (Key) Founded YearsOld
NY Pataki Roberton 12/28/1941 59
NY Pataki Willville 8/24/1932 68

Table 4.7. A Founder Table

State (Key) City (Key) Founder (Key)
NY Roberton Erwin
NY Roberton Patton
NY Willville DeWolf

Table 4.8. A Suburb Table

State (Key) City (Key) Suburb (Key)
NY Roberton Michaelville
NY Roberton Alexopolis
NY Willville Auburn

We could not conform to the 2NF until we had complied with the 1NF because, each layer of normalization builds on the previous layers. In order to conform to the 2NF, the tables must follow these guidelines:

All nonkey fields must be related to all key fields.

The tables must comply with the rules of the 1NF.

When you look at our current schema, you can see that City is in violation of the 2NF because Governor is dependent only on State, not on City. Once more, the solution is to add a new entity that stores the governor information about the state. One of the benefits of conforming to the 2NF is that you will remove repetitive data, because you have to store the Governor of Ohio only once. This saves storage space and keeps that data consistent because they are entered and stored only once. Tables 4.9-4.12 show your tables in the 2NF.

Table 4.9. The 2NF City Table

State (Key) City (Key) Founded YearsOld
NY Roberton 12/28/1941 59
NY Willville 8/24/1932 68

Table 4.10. The 2NF State Table

State (Key) Governor
NY Pataki

Table 4.11. The 2NF Founder Table

State (Key) City (Key) Founder (Key)
NY Roberton Erwin
NY Roberton Patton
NY Willville DeWolf

Table 4.12. The 2NF Suburb Table

State (Key) City (Key) Suburb (Key)
NY Roberton Michaelville
NY Roberton Alexopolis
NY Willville Auburn

After we have completed normalization through the 2NF, we can further normalize by reviewing the rules of the 3NF. In order to conform to the 3NF, the tables must follow these guidelines:

Nonkey fields cannot be dependent on any other nonkey field.

Remove any derived or computed columns.

The tables must comply with the rules of the 2NF.

If you review our current schema, you'll see one obvious violation of the 3NF: the YearsOld column in our City table. YearsOld is a derived column based on the current date and the Founded column of the City table. Physically storing this information is both a waste of space and a potential maintenance and accuracy problem. To comply with the 3NF, we must remove the YearsOld column. Maintaining derived data like these can introduce inaccuracies, producing application failures. Consider if this field was used to determine a city's eligibility for financial rewards that are based on a city's anniversary. Now our City table in the 3NF will look like the one in Table 4.13.

Table 4.13. The 3NF City Table

State (Key) City (Key) Founded
NY Roberton 12/28/1941
NY Willville 8/24/1932

After you have completed normalization through the 3NF, you have designed an efficient database model that will offer optimized storage requirements, data maintenance, and querying capabilities for OLTP systems,

In the following sections, we use the designer tool available in SQL Server to create a simple ER diagram for our Southwind database. From our ER diagram, we can create our database tables to store information for our application. Our ER diagram will be based on the logical model we created in Figure 4.7.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781928994190500075

Accounting

Uday S. Murthy , in Encyclopedia of Information Systems, 2003

IV.C Data Repository for Storing Revenue-related Information

The ER diagram in Fig. 4 depicts the various entities and relationships that must be represented in the enterprise database for storing information related to revenue business processes for the illustrative retailing firm scenario. A standard set of conversion rules are applied to deduce the relational tables that should result from the ER diagram. The conversion rules are as follows: (1) a separate table is created for each entity; (2) attributes are created for each entity and the primary key in each entity is identified; (3) for the purpose of conversion, all "optional" relationships are treated as "mandatory many" relationships; (4) the primary key of the entity on the "one" side of a relationship is posted to the table of the entity on the "many" side of a relationship; and (5) a separate table is created for the relationship itself for entities participating in an M:M relationship with the primary key of each table being posted to the new relationship table to form a composite key. Attributes unique to many-to-many relationships are posted as nonkey attributes in the composite key table. Applying the conversion rules and streamlining the resulting tables to eliminate redundancies and inconsistencies, we arrive at the set of tables shown in Fig. 5. Primary keys are underlined and foreign keys are indicated with an asterisk at the end of the field.

Figure 5. Tables for revenue processing subsystem.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0122272404000010

Database Design Case Study 2

Jan L. Harrington , in Relational Database Design (Third Edition), 2009

Designing the Tables

The ER diagram in Figure 12-18 produces the following tables:

volunteer (volunteer_numb, first_name, last_name, street, city, state_code, zip, phone)

state (state_code, state_name)

availability (volunteer_numb, day_code, start_time, end_time)

day (day_code, day_name)

skill (skill_numb, skill_description)

skills_known (volunteer_numb, skill_numb)

job (job_numb, job_description, job_date, job_start_time, estimated_duration, numb_volunteers_needed)

job_skill_required (job_numb, skill_numb, numb_volunteers_with_skill)

volunteer_scheduled (volunteer_numb, job_numb, skill_numb)

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123747303000140

Normalization

Jan L. Harrington , in Relational Database Design (Third Edition), 2009

Translating an ER Diagram into Relations

An ER diagram in which all many-to-many relationships have been transformed into one-to-many relationships through the introduction of composite entities can be translated directly into a set of relations. To do so:

Create one table for each entity.

For each entity that is only at the "one" end of one or more relationships and not at the "many" end of any relationship, create a single-column primary key, using an arbitrary unique identifier if no natural primary key is available.

For each entity that is at the "many" end of one or more relationships, include the primary key of each parent entity (those at the "one" end of the relationships) in the table as foreign keys.

If an entity at the "many" end of one or more relationships has a natural primary key (for example, an order number or an invoice number), use that single column as the primary key. Otherwise, concatenate the primary key of its parent with any other column or columns needed for uniqueness to form the table's primary key.

Following these guidelines, we end up with the following tables for the Antique Opticals database:

Customer (customer_numb, customer_first_name, customer_last_name, customer_street, customer_city, customer_state, customer_zip, customer_phone)

Distributor (distributor_numb, distributor_name, distributor_street, distributor_city, distributor_state, distributor_zip, distributor_phone, distributor_contact_person, contact_person_ext)

Item (item_numb, item_type, title, distributor_numb, retail_price, release_date, genre, quant_in_stock)

Order (order_numb, customer_numb, order_date, credit_card_numb, credit_card_exp_date, order_complete?, pickup_or_ship?)

Order item (order_numb, item_numb, quantity, discount_percent, selling_price, line_cost, shipped?, shipping_date)

Purchase (purchase_date, customer_numb, items_received?, customer_paid?)

Purchase item (purchase_date, customer_numb, item_numb, condition, price_paid)

Actor (actor_numb, actor_name)

Performance (actor_numb, item_numb, role)

Producer (producer_name, studio)

Production (producer_name,item_numb)

Note: You will see these relations reworked a bit throughout the remainder of the first part of this book to help illustrate various aspects of database design. However, the preceding is the design that results from a direct translation of the ER diagram.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123747303000061

Foreword

Dr. Gordon C. Everest , in Information Modeling and Relational Databases (Second Edition), 2008

ORM does not supplant ER diagrams or relational database designs, rather it is a stage before. It can enable, enlighten, and inform our development and understanding of ER/relational data models. We build records more for system efficiency, than for human convenience or comprehension. The premature notion of a record (a cluster of attribute domains along with an identifier to represent an entity) actually gets in the way of good data modeling. ORM does not involve records, tables, or attributes. As a consequence, we don't get bogged down in "table think"—there is no need for an explicit normalization process.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123735683500035

Data Modeling for the Structured Environment

W.H. Inmon , ... Mary Levins , in Data Architecture (Second Edition), 2019

The Dis

The next level of the data model is the place where much detail is found. This level of the data model is called the "data item set" (dis).

Each entity identified in the ERD has its own dis. Using the simple example shown in Fig. 7.3.3, there would be one dis for customer, another dis for order, another dis for product, and yet another dis for shipment.

The dis contains keys and attributes, and the dis shows the organization of the data.

The symbol for a simple dis is seen in Fig. 7.3.4.

Fig. 7.3.4

Fig. 7.3.4. A data item set—dis.

The basic construct of a dis is a box. In the box are the elements of data that are closely related and that belong together. The different lines between the groupings of data have meaning. A downward-pointing line indicates multiple occurrences of data. A line to the right indicates a different type of data.

As a simple example of a dis, consider the dis shown in Fig. 7.3.5.

Fig. 7.3.5

Fig. 7.3.5. A simple dis.

The anchor or primary data are indicated by the box of data that is at the top left of the diagram. The anchor box indicates that the data that relate directly to the key of the box are description, unit of measure, unit manufacturing cost, packaged size, and packaged weight. The elements of data exist once and only once for each product.

Data that can occur multiple times are shown beneath the anchor box of data. One such grouping of data is component id. There can exist multiple components for each product. Another grouping of data that is independent of component id is inventory date and location. The product may have been inventoried in multiple places on different dates.

The lines going to the right of the anchor box indicate types of data. In this case, a product may be used in flight or in ground support.

The dis indicates the keys, attributes, and relationships for an entity.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128169162000255

Formalization

Jean-Louis Boulanger , in Certifiable Software Applications 3, 2018

6.2.3.9 Entity-relationship diagrams

The entity-relationship model is widely used for designing databases and can also be used to describe the data of a system and their structure.

As for the class diagram, the entity-relationship diagram is easy to use and powerful enough to represent relational structures. It is mainly based on a graphical representation that facilitates its understanding.

Figure 6.17

Figure 6.17. Example of a class diagram

The concepts used are:

entity-types: a set of entities that share data characteristics (e.g. "student", "courses", "university");

relationship-types: a set of relationships between entity-types (e.g. "study", "teach", "is registered with");

attributes: a property of an entity-type or a relationship-type;

cardinality (of a relationship): the number of relationship instances to which an entity can participate.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781785481192500066

Key Concepts

Danette McGilvray , in Executing Data Quality Projects (Second Edition), 2021

Data Model Example

Figure 3.13 shows an example of an entity/relationship diagram (ERD). As mentioned, there are many notations for data models. This figure is rendered according to Approach A, the Row 2 "Semantic" model in Table 3.6 – Data Model Comparison, which follows further in this section. Cardinality and optionality are shown according to the discipline of Richard Barker and Harry Ellis, also used by Approach A. In the diagram:

Figure 3.13

Figure 3.13. An Entity/Relationship Diagram (ERD).

Customer, Sales Order, SO Line Item, Product Type, etc. are examples of entity types.

Note that City, State, and Country are also entity types.

But each of these is also what is called a "sub-type of" the entity type Geographic Area.

That means that each of the sub-types (for example, City) is, by definition, also a definition of the super-type, Geographic Area.

"SO Number", "SO Issued Date", and "SO Completed Date" are attributes of Sales Order.

The asterisk (*) means the attribute is mandatory.

A circle means it is optional.

A hashtag with underline (#) means the attribute is part of the unique identifier for the entity type. The attribute name is also underlined.

The lines between pairs of entity types are examples of Relationships.

Cardinality and optionality of each relationship are further clarified by the text at each end of the line and can be stated in a sentence.

By naming relationships in this way, the resulting sentences must be either true or not true, when presented to someone in the business side of the organization.

Examples of relationships with resulting sentences:

Each Customer may be buyer in one or more Sales Orders.

Optionality of "may be" is indicated by the dashed line nearest Customer.

Cardinality of "one or more" is indicated by the crow's foot nearest Sales Order. Note: For cardinality, the crow's foot notation indicates "many" by its many "toes." It was invented by Gordon Everest, who originally used the term "inverted arrow" (Everest, 1976).

Alternatively, each Sales Order must be sold to one and only one Customer.

The Optionality "must be" is indicated by the solid line nearest the subject entity type (in this case Sales Order).

The Cardinality "one and only one" is indicated by the absence of a crow's foot nearest the object entity type (in this case Customer).

Example of relationship that is also part of the unique identifier:

A vertical bar next to a crow's foot means that the nearest relationship is also part of the unique identifier.

Each instance of a Sales Order is uniquely identified only by SO Number (indicated by #), whereas each instance of an SO Line Item is identified by a combination of the attribute # SO Line Number and the relationship "part of" Sales Order.

Data models at this level are usually used by data modelers only. A good data modeler will facilitate discussions and use simpler diagrams that can be understood by business audiences who need to verify the data and relationships, yet do not need to interpret a detailed model. If needed, someone on your data team with good communication skills can work with the data modeler to ensure those involved can provide input to and comprehend what the data model means to the business and how it can be used by the technologist.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128180150000098

Normalization

Toby Teorey , ... H.V. Jagadish , in Database Modeling and Design (Fifth Edition), 2011

The Design of Normalized Tables: A Simple Example

The example in this section is based on the ER diagram in Figure 6.4 and the following FDs. In general, FDs can be given explicitly, derived from the ER diagram, or derived from intuition—that is, from experience with the problem domain.

Figure 6.4. ER diagram for employee database.

1.

emp_id, start_date -> job_title, end_date

2.

emp_id -> emp_name, phone_no, office_no, proj_no, proj_name, dept_no

3.

phone_no -> office_no

4.

proj_no -> proj_name, proj_start_date, proj_end_date

5.

dept_no -> dept_name, mgr_id

6.

mgr_id -> dept_no

Our objective is to design a relational database schema that is normalized to at least 3NF and, if possible, minimize the number of tables required. Our approach is to apply the definition of 3NF given previously to the FDs given above, and create tables that satisfy the definition.

If we try to put FDs 1–6 into a single table with the composite candidate key (and primary key) (emp_id, start_date) we violate the 3NF definition, because FDs 2–6 involve left sides of FDs that are not superkeys. Consequently, we need to separate FD 1 from the rest of the FDs. If we then try to combine 2–6 we have many transitivities. Intuitively, we know that 2, 3, 4, and 5 must be separated into different tables because of transitive dependencies. We then must decide whether 5 and 6 can be combined without loss of 3NF; this can be done because mgr_id and dept_no are mutually dependent and both attributes are superkeys in a combined table. Thus, we can define the following tables by appropriate projections from 1–6.

emp_hist: emp_id, start_date -> job_title, end_date

employee: emp_id -> emp_name, phone_no, proj_no, dept_no

phone: phone_no -> office_no

project: proj_no -> proj_name, proj_start_date, proj_end_date

department: dept_no -> dept_name, mgr_id

mgr_id -> dept_no

This solution, which is BCNF as well as 3NF, maintains all the original FDs. It is also a minimum set of normalized tables. In the "Determining the Minimum Set of 3NF Tables" section, we will look at a formal method of determining a minimum set that we can apply to much more complex situations.

Alternative designs may involve splitting tables into partitions for volatile (frequently updated) and passive (rarely updated) data, consolidating tables to get better query performance, or duplicating data in different tables to get better query performance without losing integrity. In summary, the measures we use to assess the trade-offs in our design are:

Query performance (time).

Update performance (time).

Storage performance (space).

Integrity (avoidance of delete anomalies).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123820204000100