Database Design: From Novice to Professional [Ch.1: What Can Go Wrong]

·

6 min read

Database Design: From Novice to Professional [Ch.1: What Can Go Wrong]

Designing a poor database can lead to many problems that may not show at the time of creation of the database but will definitely show in the future. So in this chapter, we will show examples drawn from real life to illustrate some very basic types of problems encountered when data is stored in poorly designed tables and how this can be annoying in the future.

Mishandling Keywords and Categories

A common problem in database design is the failure to properly deal with keywords and categories. Many database applications involve data that is categorized in some way: products or events may be of interest to certain categories of people; customers may be categorized by age or interest or income (or all three).

When entering data, you usually think of an item with its particular list of categories or keywords. However, when you come to preparing reports or doing some analyses, you may need to look at things the other way round. You often want to see a category with a list of all its items or a count of the number of items. For example, you might ask “What percentage of our customers are in the high-income bracket?” If keywords and categories are not stored correctly initially, these reports can become very difficult to produce.

1.PNG.png

If we look up a plant, we can see immediately what its uses are. However, if we want to find all the plants suitable for hedging.

To produce a report of all hedging plants would require some logic along the lines of IF Usage1 = 'hedging' OR Usage2 = 'hedging' OR Usage3 = 'hedging'

if we decided to add new plant that have 4 usage we will need to add new column "usage4" Any logic will need to be altered to include " OR Usage4 = 'hedging' "

Changes such as I’ve been describing become too tedious to maintain. it never fulfills the potential of being able to conveniently suggest suitable plants for a prospective purchaser. Much of the usefulness of that carefully collected data on usages is lost.

design of the table makes it impossible to answer obvious questions conveniently.The Problem is that the developer did not take time to step back and consider the likely uses of the data.

The most obvious of these is “I want to find all the plants that have this particular use.”

a solution for the plant database problem, you can quite quickly set up a useful relational database by creating the two tables

2.PNG.png

Repeated information

Another common problem is unnecessarily storing the same piece of information several times. For example, in a small business, each order form may record the associated information of the customer’s name, address, and phone number

If we design a table that reflects such a form, the customer’s name, address, and phone number are recorded every time an order is placed. but problems occur when the customer moves house.

Sometimes the repeated information is not quite so obvious.

Team members of a long-term environmental project regularly visit farms and take samples to determine the numbers of particular insect species present. Each field has been given a unique code, and on each visit to a field a number of representative samples are taken. The counts of each species present in each sample are recorded.

3.PNG.png

there are problems. The fact that field ADhc is on farm 1(each field should be on exactly one farm) is recorded every visit.

this kind of design may leads to typo problems like the error of listing field ADhc under farm 2 (in row 269) instead of farm 1, maybe this typo is not big deal but imagine typo between field and farm so maybe one assign field to a farm that doesn't have this type of field

It is important to distinguish the difference between data input errors (anyone makes typos now and then) and design errors. The problem in Example 1-3 is not that field ADhc was wrongly associated with farm 2 (a simple error that could be easily fixed), but that the association between farm and field was recorded so many times that an eventual error became almost certain. And errors such as these can be very difficult to detect.

Another piece of information is repeated in the spreadsheet in Example 1-3: the date of a visit. The information that field ADhc was visited in Aug-06 is repeated in rows 268 to 278, creating another source of avoidable errors (e.g., we could accidentally put Sept-06 in row 273) that would affect any analyses based on date.

The repeated visit date information also gives rise to an additional and more serious problem: If we would like to add additional information to each date (e.g., it was raining in that date) does it go on every row for that visit (awkward and compounding the repeated information problem)?

solution: instead write all fields that's important we need redesign our table,

the designer would have thought about Farms, Fields and Visits as separate classes of data in which researchers are interested both individually and together. For example, the researchers may want to find information about farms of a particular size or fields with specific crops or visits undertaken just in the spring

4.PNG.png

As well as removing the problems with repeated data, the design in Figure 1-5 now gives room for additional information about each Field (e.g., size, soil type). The design also enables the recording of information about each Visit (e.g., weather conditions). now we can make more complex quires (analysis days which was raining)

Designing for a Single Report

A university department needed to have its final-year students’ provisional results in a format suitable to take along to the examiners’ meeting.

A database table was designed to exactly match the report

The next year the problems started. Can you figure them?

5.PNG.png

Some students who failed and then retook a subject but the full academic record for a student needed to be recorded, and the design of the table made it impossible to record more than one mark if a student did a subject several times.

this example is particularly good for showing how much trouble you can get into with a poor design. Once again, an inappropriate data model is to blame. The developer could see only one class: Student. His view was based on students as was the report. We should see that at the very minimum we have two classes, Student and Subject, and we are interested in the relationship between them.

We Came to the End in the next chapter we will show how an investigation of a Many–Many relationship such as the one between Subject and Student would have led to the introduction of another class, Enrollment. This allows different marks to be recorded for different attempts at a subject.

References

Beginning Database Design: From Novice to Professional