|
Database design methodology - V In order that searches and joins can be carried out efficiently, it is important to have indexes created for primary and foreign keys in a database table. Apart from the clustered indexes discussed last time, other types of keys called 'Alternate keys' and 'Inversion entries' can also be used. Alternate keys are unique identifiers like the primary key. However these have not been designated as the primary key. You need to create indexes on this field if this appears in search conditions for critical queries.
There may be more indexes required for the purpose of improving performance of critical queries. There are non-key fields which may not be unique, or may not be referring to another parent entity. However, searches need to be conducted on these fields too. Examples of such fields include Pincode, Name, City, etc. Such non-unique searchable attributes are called inversion entries.
Indexes that contain more than one field are called compound indexes. As far as possible, it is preferable to minimise compound indexes. Compound indexes are larger, take longer to update and can be used for queries that pertain only to the first field in the index key. Therefore, these should be used only when absolutely required.
Create covering indexes
A covering index is a special type of index that provides tremendous performance benefits. It is used in cases where the search condition and the search result have common fields. Consider this query:
select Fname from Customers where Lname > "S Here assuming you have an index on Lname + Fname, when the index finds the entry for the Lname, it has also found the Fname associated with it. It does not require to go to the table data at all. All the information about the search and the output is available in a covering index. This should be used only when such a query is commonly required and is performance sensitive.
In any case, it is important to minimise the number of indexes per table. Indexes may increase the search performance but they reduce the add-update-delete performance. Thus a balance must be achieved between reading performance and writing performance by creating only useful indexes.
Generate entities in the database
If you are using a Case tool, now is a good time to actually forward engineer and create the entities, relationships, constraints, rules, indexes, views, stored procedures and any other database objects you have designed.
At this stage you may encounter problems related to entity names which are reserved words. These need to be resolved and the entities should be regenerated. If you have a multi-RDBMS system, you will need to repeat this step for each RDBMS.
Add constraints for the primary and foreign key
The generation process adds primary and foreign key references to the CREATE TABLE statements itself. If not, you will have to specify the relevant references for each table.
Create stored procedures for referential integrity
Declarative referential integrity: Here, you just have to specify which tables are parents and which are children. The rest of referential integrity management is performed by the base data engine. No additional code is required.
Create utilities for adding sample data
At this stage, it is important to test the functional performance of our data model. This can only be done when the amount of data in the tables is similar to the expected real life volume. This type of volume testing is often ignored till a later stage or is never performed with sufficient data. This leads to systems that run well in the initial phase, but slow down unbearably when the data volume increases.
Sample data can be added by various methods. Here are a few:
From existing data: If you already have data from some old system(s), you can load that data into the new design. This may require some amount of coding as the data structures may have changed. Simulate transactions programmatically: You could invoke stored procedures or middle layer business objects to add a large number of transactions programmatically. Generate data directly: By knowing the type of data that is likely to accumulate in a real life situation, you can programmatically add data. For example, the random number generating functions are very useful for generating numeric data. Add sample data
When adding sample data by using one or more of the above methods, you must perform tests with both moderate and large amounts of data. The amount of data inserted artificially must be documented and preferably backed up for future use. This backup serves as the base benchmark for future attempts at optimising the database for speed.
Write key procedures, queries and views
Now that you have the data structures and data, you can start coding the queries, views, procedures and so on. As soon as you write a query, you have the opportunity of testing its syntax, performance and optimisation level with real life data.
Test the performance critical queries
For procedures and queries, you must now benchmark the execution performance with large volumes of data. If this is within the limits defined in the functional requirements, further optimisation is not required.
When a query runs, you can view the query execution plan using features provided by the underlying data engine. This provides information about the tables, indexes, temporary tables used. It also provides information about whether the read and write occurred from the hard disk or from cache, as also information about whether there was a table scan (going through the table from first to last row) or was any index used.
Check for appropriate usage of indexes
Typically, one index is used per table for any query optimisation. In some cases, you may find that an index, that you expected the query engine to use, is not used at all. In such cases, it is important to ascertain the reason for this lack of use. The reason could be simply that the number of records in the table are so few that using an index is a more time consuming operation as compared to directlylooping through all records.
However there could be other reasons too, like the index not consisting of the search criteria, or that some other search criteria has already used another index in the given table, and so on. You need to ascertain the actual reason for non-usage and correct it at this stage.
Refine and optimise the queries
At the end of the above steps, you will have the complete data structures and constraints that have queries and procedures running as per the functional requirements.
If it is not possible to attain the expected performance results with realistic data, you must arrive at one of the following conclusions and act accordingly:
If you feel that the optimisation has not been achieved to the fullest extent, you need to continue refining the design till the results are achieved. If the performance requirements are being met using a smaller set of data, you need to attempt to reduce the live data size by purging the infrequently used data. If the hardware on which you are testing is of a lower configuration than the production system, you need to either procure a similar system for benchmarking or alter the performance requirements to offset the hardware limitations. If it is not possible to achieve the desired performance in spite of all technical fine tuning and hardware adequacy, you may reach the conclusion that the required performance is not achievable using the current system design and infrastructure. If this conclusion is reached, the end-users need to be educated about the difficulty and the performance parameters must be revised. Procedures to support middle layer objects
If you are using middle layer business objects, you may require to write more stored procedures for use by the business objects. These procedures will be called from middle layer components. The components will simply encapsulate a stored procedure call without adding any more code. You can also develop more complex components that have native code in addition to calling stored procedures.
Write event driven processes using triggers
The application may require to perform certain actions based upon the data which is entered or manipulated. This functionality is typically implemented using triggers. Trigger-based event codes can perform many functions like:
Providing an alert to some authority about invalid data. Automatically starting SQL or non-SQL processes upon specific adding or updating data. Using data from the updated row to calculate and update other fields in the row. Copying the row information into an archival table. Create database users and apply security constraints
Most RDBMSs provide user- and field-based security. Users are created in the database and actual OS login-ids are mapped to it.
As users use the various tables differently for each transaction type, applying security at the user, field, and functionality level is very cumbersome.
For object-based business logic, the security management is simpler. Objects typically login to the RDBMS at a high level of privilege and end-users do not have any direct access to it. Object usage by end-users can be specified by the roles that are mapped for the object. This simplifies security to the role-functionality matrix, which is much more manageable.
Go ahead with the middle layer and front end
At this stage, your data design is fairly solid and you can now concentrate on other aspects of application development
|