|
Database design scenarios - I In the course of the past few articles, we have outlined a practical method of designing databases for your applications. When you actually try to implement the database design, you will be faced with multiple choices. After having designed, fine-tuned and optimised a multitude of databases for various clients and projects, I have observed some common data design scenarios that are applicable to most projects.
I would like to highlight these scenarios for the benefit of all the readers. Wherever applicable, I will also highlight the common mistakes that are performed in a given situation. For visual clarity, I will use the Logical Design model from ERWIN Logic Works for illustrations.
For the purpose of illustration, you can consider the word 'entity' as the synonym of a table in a database. The actual difference is that the entity is logical (it exists only in an abstract model of the system) and a table is physical (it exists in a real database).
Masters
This is probably the most common type of table created for all applications. Master entities (tables) typically contain the basic data that business transactions use. Master entities have a standard list that needs to be updated periodically. All transactions usually refer to master entities. Therefore, the design of master entities affects the design of the actual system substantially.
Examples of masters include the following:
1. List of products manufactured, to be used by various applications like sales, distribution, marketing, etc.
2. List of subscribers would be a master table for a phone company. The billing table would contain transactions that refer to the subscriber master for decoding the billing details of each customer.
3. List of diseases would be a master list of possible diagnoses. Each patient would be referring to a particular diagnosis from the diagnosis master.
All masters normally have a code and a description attribute. More attributes will be required as per the functional requirements. There are some important issues which you must consider while designing masters.
Codes
The unique primary key for master tables is often used as a shortcut for entering an item from the field.
Code Product Name PENC Pencil FPN Fountain Pen BI Blue Ink KI Black Ink
For example, the Product table could be as shown alongside:
The interface would be implemented as shown below:
Here if you know the code, you can type the code and press the Tab key. The code will be internally decoded to the actual product name and displayed.
If you do not know the code, you can simply open the list and view the available products and their codes, and select the required one.
At first glance, there does not seem to be any problem with the design of the table. However, there is a problem here. There is one base rule that concerns every item that is added by end-users. "You must allow for any user-entered item to be edited and deleted." This is a general rule and can have exceptions. But mind you, exceptions are rare.
This means that if a new user does not find that the code "KI" intuitively identifies Black Ink, and wants to change it to "BKI", your system must allow for this change. The problem is that the code "KI" has already appeared in all previous transaction logs, audit and other tables that have referred to the Black Ink sale in some way. For example the table for transactions of sale may be as follows:
ProductID Qty Date Discount KI 33 3 Mar 1999 0 PENC 3 4 Feb 1999 2 KI 44 4 Mar 1999 4
Now if the user intends to change the code "KI" to "BKI", you will have to find all references to "KI" and change these to "BKI". This may involve multiple tables.
In any case this is a cumbersome step. You can simply eliminate this complexity and still allow the user to update the shortcuts by this simple design change. Here's how your Product master should be designed:
Code Shortcut Product Name 112 PENC Pencil 115 FPN Fountain Pen 110 BI Blue Ink 190212 KI Black Ink Here the code is an internally used value. The end-users never know about the code. Their interaction occurs only with the product name and shortcut.
All the transaction tables and other referring tables that use a link to the Product master will always use the Code field. Both the name and the shortcuts can be changed without having to change anything in referring tables.
Managing complex codes
One of the basic principles of database design recommends that there should be only one meaning attached to one attribute. However, in practice, this simple rule is often violated.
You are likely to encounter this problem more often if you are in the process of migrating an older database design into a new application. Typical misuse of this rule is for codes of various items. Here is an example.
Consider a Product master table that identifies a product by the attribute called Product Code. In this example this field is 12 characters long. So far so good. However the Product Code does much more than just identify a product uniquely. It has more information than this. Here is how the information is stored:
Characters Meaning 1 Product Group (1=Consumer Durable,2 = Electronics, 3 = Communication Device, 4=Cosmetics) 2-4 Manufacturing Location (DAM = Daman, DEL = Delhi, PAN = Panvel) 5 This is a flag which indicates whether this product is available in multiple packaging 6-8 Indicates packaging type 9-12 Unique code for the product itself Nothing may seem wrong with the scheme of things here. However consider the following disadvantages:
Insufficient space: Suppose the product groups become more than 9, there is absolutely no way of changing the code. The problem is not as simple as increasing the width of the field by one character. You will have to change the size of the product code reference field in all dependent tables. In addition, you will have to change the source code that was earlier looking for the first character to finding the product group. Now the code must look for the first two characters.
Remapping is very difficult and cumbersome: Suppose a new category called Herbal products is introduced and some products from the Cosmetics product group have to be shifted to it. The product code will now change as the group has changed. This requires you to find and replace all references to the older code with the new code in all dependent tables. If this application runs at multiple sites, you will have to perform this search at all sites.
It is impossible to index: It is quite common to require a filtering of products by product group or manufacturing location, etc. However, the complete field can not be used for this filtering. You will need to use a substring of the complete field data to find the group or any other information.
While xBase type of engines allow the use of a substring function in an index expression, no RDBMS allows you to use a substring or any other function in the index key expression. You can only use the whole field, or a combination of multiple fields.
Thus when you implement this design using an RDBMS engine, you cannot have any useful index for searches based on product group or manufacturing location or packaging type. This will hamper the performance of these queries, as there will always be a table scan required for executing these queries.
Hidden meaning: The exact meaning of each type of product group and its representation as a part of the code is not clearly visible anywhere. If someone changes or adds a new meaning, there is no easy and guaranteed way of making all dependent applications understand and utilise this change.
For example, if you want to add a new product group code called Herbal, the fact that it is actually represented as number 4 will be only evident from the front-end code where a particular option is mapped to this type.
A possible solution
The simplest solution, as discussed earlier, is to provide a mapping table for each additional categorisation of the products. Thus the Product Code field will simply contain the pure product code itself and nothing else. Additional attributes will be added to the table for product group, manufacturing locations and packaging, etc. These codes will be based upon separate tables for product groups, manufacturing locations and packaging types.
Problem with this solution: This modified design is fine. But one problem still remains. End-users are now used to the older method of understanding everything about the product by just looking at the code and deciphering the meaning of each of the subparts. Now this information does not exist in product code. Therefore, users may not accept the new design even though it is technically very efficient. Thus we need to refine this solution further.
The final solution: The idea is to use the best of both worlds. Allow end-users to use the existing composite code as a shortcut and use a unique product code internally.
There is one more addition required. When a new product is added, you must create this shortcut code manually by adding the relevant shortcuts for product group, manufacturing location, etc, for the convenience of the user.
|