|
Database design scenarios - III After having explored the importance of volume testing in the initial stages of database design, let's take a look at a few more scenarios that we may come across while designing databases.
Mapping/decoding relationships
Simple mapping relationships are typically used for providing additional information about an entity. For example, the transaction table may contain a field called Currencyid. CurrencyId is the primary key of the Currency table.
This is quite simple. However, in practice, the relationships are much more complex. A typical transaction would look like this:
At first glance, everything seems to be fine here. But there are some problems with this approach.
* The primary key of the transaction table has become a composite key. This makes a much larger index.
* An index based upon a composite key can be utilised for searches that are based upon only the first attribute in the key. The other two attributes, if subject to frequent searching, will need separate indexes on each field. This will increase the index sizes and degrade the performance of table updates.
* If this transaction table itself is the parent of another table, all these three primary key attributes need to migrate to this new entity. This entity would also require a composite index for searching purposes. This type of multi-layered hierarchy is quite common. Thus in a fairly complex application, it is possible that such inefficient design occurs for multiple entities. This and similar flaws in design will eventually lead to sub-optimal performance.
The solution to this problem is very simple. All you need to do is to define all these attributes as non-primary key attributes and create a unique row identifier as the primary key for the transaction table.
Here is how it is done. This diagram also shows that even if the transaction table has further child tables, only one field flows down as the foreign key.
Now the index on Transaction Number will be a simple index which is more efficient. The foreign keys like customer code, currency code remain as foreign keys for managing the integrity of the data.
The Details table just requires the reference to Transaction Number. Please note that there is no primary key for the Details table. If your application requires unique separate access to each row of the Details table, you can add another "detail number" primary key to the table.
Hierarchical relationship
One common requirement involves storing hierarchy. Hierarchy refers to a tree structure with one root and multiple branches. One branch can have multiple sub-branches and so on.
Another example could be based on classifications of animals, plants, diseases, and so on.
A different type of usage of hierarchy is to provide a representation of the structured flow of decisions or actions based upon some test.
Storing such hierarchy is really very simple. The table structure is as follows:
The root department would have the Reporting To field as NULL. All other departments will have another departmentID as the foreign key. The same structure will be usable for all other hierarchies mentioned above. Further refinements can be made to this structure to provide added features like:
* One node having more than one branch as parents.
* Ability to have multiple roots.
* User-definable trees where this mapping occurs at runtime and is changeable by users.
Representing hierarchy in the user interface:
Unlike other elements of data design, which are represented as fields on a form, hierarchy typically requires a special control--the Treeview control which is bundled with most development tools. All development tools provide sample code to demonstrate how the information in a table can be shown in the Treeview control. The simple File Manager or Windows Explorer are examples of Treeview control.
A horizontal representation of a tree structure is more intuitive. The Organisation Chart builder in PowerPoint is an example of this. However, there is no built-in control in VB, VC or VFP that allows you to represent treeviews horizontally. You will have to code this ActiveX control using some tool.
Efficient index design
Unlike many non-relational systems, relational indexes are not considered part of the logical database design. Indexes can be dropped, added, and changed without impacting the database schema or application design in any way other than performance. Efficient index design is paramount to achieving good SQL Server performance. For these reasons, you should not hesitate to experiment with different indexes.
The optimiser reliably chooses the most effective index in the majority of cases. The overall index design strategy should be to provide a good selection of indexes to the optimiser, and trust it to make the right decision.
The following are index design recommendations:
* Examine the FROM and WHERE clauses of your SQL queries because this is the primary focus of the optimiser.
Each column listed in the WHERE clause or in the JOIN portions of the FROM clause is a possible candidate for an index. If you have too many queries to examine, pick a representative set or just the slow ones. If your development tool transparently generates SQL code, this is more difficult. Many of these tools allow the logging of the generated SQL syntax to a file or screen for debugging purposes, else use SQL Trace to capture the SQL syntax on the server side.
* Use narrow indexes.
Narrow indexes are often more effective than multi-column, compound indexes. Narrow indexes have more rows per page and fewer index levels, which improves performance. The SQL Server optimiser maintains distribution statistics on all prefix sets of a compound index. Therefore, if the first column or columns of a compound index have poor selectivity, the optimiser may not use the index.
The optimiser can rapidly and effectively analyse hundreds or even thousands of indexes and join possibilities. Having a greater number of narrow indexes provides the optimiser with more possibilities to choose from, which usually helps performance.
It is often best not to adopt a strategy of emphasising a fully covered query. It is true that if all columns in your SELECT statement are covered by a non-clustered index, the optimiser can recognise this and provide excellent performance. However, this often results in excessively wide indexes and stakes too much on the hope that the optimiser will use this strategy.
You should not have more indexes than are necessary to achieve adequate read performance because of the overhead involved in updating those indexes. However, even most update-oriented operations require far more reading than writing. Therefore, do not hesitate to try a new index if you think it will help; you can always drop it later.
* Use clustered indexes.
Appropriate use of clustered indexes can tremendously increase performance. Even UPDATE and DELETE operations are often accelerated by clustered indexes because these operations require much reading. You can only have a single clustered index per table, so use this index wisely. Queries that return numerous rows or queries involving a range of values are good candidates for acceleration by a clustered index. For example:
select * from Membertable where Member_No > 5000 and Member_No < 6000 * Examine column uniqueness.
This helps you decide what column is a candidate for a clustered index, non-clustered index, or no index. The following example query allows you to examine column uniqueness:
select count (distinct colname), count(*) from tablename This returns the number of unique values in the column and the total number of rows. Compare these values. On a 10,000-row table, 5,000 unique values would make the column a good candidate for a non-clustered index. On the same table, 20 unique values would better suit a clustered index. Three unique values should not be indexed at all. These are only examples, not hard-and-fast rules. Remember to place the indexes on the individual columns listed in the WHERE and FROM clauses of the queries.
* Examine data distribution in indexed columns.
A long-running query often is caused by indexing a column with few unique values or performing a JOIN on such a column. This is a fundamental problem with the data and query itself and cannot usually be resolved without identifying this situation.
In addition to the above query, which gives a single figure for column uniqueness, you can use a GROUP BY query to see the data distribution of the indexed key values. This provides a higher resolution picture of the data, and a better perspective for how the optimiser views the data.
The following example query allows you to examine data distribution of indexed key values, assuming a two-column key on Col1 and Col2.
select Col1, Col2, count(*) from tablename group by Col1, Col2 This returns one row for each key value along with a count of the instances of each value. To reduce the number of rows returned, it may be helpful to exclude some with a HAVING clause.
The number of rows returned in a query is also an important factor in index selection. The optimiser considers a non-clustered index to cost at least one page I/O per returned row. At this rate, it quickly becomes more efficient to scan the entire table. This is another reason to restrict the size of the result set or to locate a large result set with a clustered index.
Do not always equate index usage with good performance, and vice versa. If using an index always produced the best performance, the optimiser's job would be simple--always use any available index. In reality, incorrect choice of indexed retrieval can result in poor performance. Therefore, the optimiser's task is to select indexed retrieval where it will help performance and avoid indexed retrieval where it will hurt performance.
|