20 Database Design Best Practices





  1. Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...).
  2. Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
  3. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better).
  4. Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.).
  5. Keep passwords as encrypted for security. Decrypt them in application when required.
  6. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...).
  7. Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
  8. Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
  9. Provide authentication for database access. Don’t give admin role to each user.
  10. Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance.
  11. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters.
  12. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance.
  13. For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc.
  14. Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code.
  15. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts.
  16. Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.
  17. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage.
  18. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.
  19. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance.
  20. Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.

This entry was posted in . Bookmark the permalink.

37 Responses to 20 Database Design Best Practices

  1. Very thoughtful post. On point #5, unless your usage requires sending a password to another system, I would recommend hashing the stored passwords rather than encrypting them. Anytime an encryption key is available in memory or on disk, the passwords are probably quite easily accessed in the event of a hack. Your system might not have critical data, but a lot of users will use the exact same password in every other banking and financial site they visit. If you can, help protect users from themselves (and inform them of better tactics when given the chance).

  2. Anonymous says:

    Hash. Passwords. With a salt. With a a hashing method that's slow (in computer terms).

  3. Anonymous says:

    For 18, if this is SQL Server, there are options for storing blob and large text data out of row. http://msdn.microsoft.com/en-us/library/ms189087.aspx.

    sp_tableoption N'MyTable', 'large value types out of row', 'ON'

  4. Anonymous says:

    #7 I wouldn't say so. What is the difference between 100 and 'BAA' in terms of indexing?

  5. CB says:

    There is I/O difference. A varchar column will take
    more space than an integer and requires more I/O to read/write data. Also memory consumption will be higher and string operations are more complex than integer operations (i.e. increment/decrement, sorting etc.)

  6. About 16 "Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.", are you sure, and why ? You must have a clustered index in your tables (most of the time it's the PK), so your query will user the clustered when filtering on the primarykey and in the other case the other indexes.

  7. Anonymous says:

    #7 Integers and there variants all require byte alignment prior to the comparison function, character types do not. The avoidance of the byte alignment step is why VARCHAR indexes are used in monster databases.

    Mike

  8. Anonymous says:

    I like prefixing columns with either Is, Has, Can or Must. This allows you to produce a more powerfull name for your column.

  9. CB says:

    @Remi: Some explanations exist about this issue:
    http://manuals.sybase.com/onlinebooks/group-as/asg1250e/ptallbk/@Generic__BookTextView/34251;hf=0
    http://technet.microsoft.com/en-us/library/ms190639.aspx
    http://www.sql-server-performance.com/2007/nonclustered-indexes/
    http://developergeeks.com/article/20/how-to-use-clustered-index-and-non-clustered-index-on-columns-in-sql-server

  10. #6 Using UNIQUEIDENTIFIER for primary keys makes it difficult (near impossible) for a user to "guess" a record id. IDs are often passed around the client (e.g. querystring in a web app). Using integers, the user can alter the ID possibly accessing otherwise inaccessible data.

  11. dpleo says:

    Prefixing objects may not be a bad practice. tblSchoolCourse, uspSchoolCourse, and vwSchoolCourse make it easy to see when reading code that a table, user stored procedure, or view is being accessed.

    Dave62

  12. CB says:

    Prefixing "tables" is a bad practice (#4). If you prefix one table, you'll have to prefix every table. It's clear that they are tables, in their context.

  13. Anonymous says:

    @Anonymous (Mike): Could you please provide some references concerning your statement that "varchar indexes are used in moster databases"? (Presumably, by "used" you meant "favored".)

    It seems to me, the system is more likely to have a memory bottleneck than a processor bottleneck. YMMV; when in doubt, profile, then optimize if needed.

  14. James says:

    #19 definitely is not a accurate statement. Normalization is good to keep data integrity, but not performance. Actually most of the times, Normalization is one factor which causes performance issue. When you deal with high traffic web applications or data intensive applications, de-normalization is the method we use to gain performance.

  15. Anonymous says:

    It's my understanding that a CHAR(3) occupies less storage, hence requires less I/O than an INT (4-bytes) - note I'm not talking about CHAR, not VARCHAR or NCHAR. So for example if you have a table of Airports or Airlines it would be better to use the 2 or 3-letter airport/airline code as a primary key - this field is unique and would probably be part of the table anyway so you save storage having to add an unnecessary INT primary key column. Also it is very convenient as a foreign key when browsing other tables as it is often immediately obvious which airport/airline is being referred to just from its 2/3 letter code. With a bit of work you could create your own 2/3 letter unique keys for other suitable tables instead of integers.

    Rob

  16. Jim S. says:

    On #4, you don't always have the context.

    For example, if I'm using intellisense in my editor, it's not always going to know that I want to enter a table name, so when I start typing, I get columns, functions, etc showing up. When you don't exactly remember the full name of the table, it just slows you down.

    Likewise, if you are saving your DB objects as scripts for source control, you don't have the context. Marking all your tables as tblWhatever.sql makes it that much faster to find what you're looking for.

    You need to be consistent, obviously, but I've found that the benefit of the extra three characters more than outweighs potential issues with prettily-formatted code.

  17. CB says:

    @Rob: Specifically for char(3), your I/O approach is true. But in real world with big tables, 3 char will not be enough.

  18. CB says:

    @Jim: For that kind of intellisense usage, it can help. Anyway I don't want to see 100 tables starting with Tbl on the DBMS user interface and ER diagrams. And I think writing 3 more characters on each query for each table is a bit time consuming.

    But these are just different viewpoints. Thanks for the comment.

  19. CB says:

    @James: I think performance is not only about less join operations of de-normalization. If your de-normalized "big table" is 4 GB in size, this will be another performance issue for disk and memory operations. So, the relation is two sided.

    Thanks for the comment.

  20. Prasad says:

    Nice list and very thoughtful.

  21. Anonymous says:

    In regard to namingconventions wouldn't tables usually be named with a noun (e.g. SchoolCourse) and stored procs including a verb (e.g. SearchSchoolCourses). As for views I see a possible conflict but since I haven't been using views much (I am not focused on designing databases in my daily work) I haven't really got much experience with naming those - I imagine some combination af nouns though (e.g. SchoolCoursesAndStudents)... :-)

    Jan

  22. Anonymous says:

    Useful suggestions

  23. Intersting range of comments, but almost all of them are negated by the fact that these "best practices" are specific to a specific database system (ms sql). For example, practice 8 is not practical since a bit field differs in each dbms and from a coding perspective they should all map to a single type. In C# you will need to query the type and cast the variable appropriately. Use a small (16 bit) integer type instead and this problem goes away - it works across all dbms types.

  24. Anonymous says:

    #2 is a matter of taste IMHO. When you put all your eggs in one basket, what do you have...a bag of egg? Not in my world.

    There are a whole lot of considerations that have to be weighed beofre deciding on indexes and datatypes. Any of your points on these concerns need to contain the phrase "it depends." On top of this, if you use INT everywhere for ID columns, you really need to be sure you have identified natural keys and wrapped them up in a unique constraint. You mention data integrity, so you don't want to break any of your own "rules" do you?

    #10 SELECT * is NEVER needed in a production ready dataabse project.

    #17 No other application should be installed on a database server. Period.

    #18 DB product specific. Some handle tables with BLOBs with ease. ANd the truth is if the BLOB needs to be returned via aquery, then the BLOB needs to be accessed anyway. Performance at this point is vendor specific, not database design reliant.

    For database design time and index planning, remember a successful application will be always altered. If users come to rely on a database application (or any application) then they will want more features over time. Design for change. Index for today, baseline it, review tomorrow.

    My guess is you are primarily a non-database developer, albeit perhaps with a strong interest in databases. Querying the interwebs for "recieved wisdom" is dangerous, even when finding "wisdom" straight from the vendor's site. For every rule of database design, there is a real world example where the rule breaks down. "Best practice" is a trap for the unwary.

  25. @dpleo
    An object name is an alias for the data the object represents, not the object itself. When you think in the problem domain, it doesn't matter if the data is coming from a view, stored proc or table, it just matters that the data is there. On top of that, it can become a maintenance pain. What happens if you have to change tblSchoolCourse to a view? You then have to go and rename it everywhere it's referenced, and if you're not the only one who works on that database, you're going to have an interesting time tracking them all down.

    Prefixing and postfixing is an old habit that really needs to die. With modern OO languages, for example, everything's an object, yet people still feel compelled to strFirstName or nAge. The intention of prefixing things was to describe the *purpose* of the thing, not the *type* of the thing, but that message got lost along the way. If they were true to their interpretation, then it would be objFirstName, objAge etc. And I actually have seen code like that...

  26. Anonymous says:

    CB, I think your answer to James on point 19 is a little off the mark. No matter what the table size, lets go big and assume it is 100GB in size, non-partitioned, unless the application is poorly designed or not designed for online use, the whole table will not be in memory. If a query causes a clustered index scan, then indexing needs to be looked at or the application request patterns. Even the smallest table work this way. Using MS SQL server as an example, the smallest amount of data that can be brought into memory from the disk is 8KB, one page. An efficient query engine only brings in data that is needed to service the query at hand. You can see this in action by creating a query joining say 20 tables, but only reference 1 in the select clause and the where clause. The query plan will not include the other 19 tables. In almost every case of slow database performance I have had to deal with, it comes down to indexing, not the number of joins. My rule of thumb is normalise to the 3rd normal form and put in production. It would be nice to be able to thoroughly stress test the database before going into production, but very few dev shops have this capacity. Most stress testing I have done is in a production environment. :)

    Dave

  27. Anonymous says:

    Very nice post. I have learn all the points in my 2 years experience. Why not you made this post 2 years before?

  28. Good pointers ..

    Need to add few points.

    1) Do Understand the strengths of underline architecture of Database your product is being built upon .. and use generic as well as specific (to DB) optimization techniques as much as possible.

    2) Do not utilize the single db for all use cases .. like the techniques defined above for partitioning, you may come across that the database itself can be divided into two or more MAJOR modules/usecases to distribute the load of types of operations

    3) Do not over-use the OLTP database .. and do understand and able to bifurcate the difference b/w OLAP and OLTP databases and for which use case which one is more suitable ..

    4) Real-time (or Interval) Replication should be used for load balancing (or for different types of usage) wherever applicable

    5) DB configuration should be properly tuned with the hardware .. and hardware procurement should be derived from the requirements of business.

  29. #7 & #18 can be combined. That is, make your records as narrow as possible. Use integers, enums, and bits over chars, varchars, etc. The narrower the record the more of them will fit in the caches. It is the caches that make your database fast. Generally, put meta-data in one table and content in other joined tables. And, as a previous commentator said, it is really important to know your database's performance characteristics regards to data volume and data contention.

  30. every data base designer must consider this rules. and another thing is this article has been written in very simple manner

  31. Anonymous says:

    I'd like to add one about security.

    #0 - DO NOT IMPLEMENT YOUR OWN CUSTOM SECURITY MECHANISM simply because you don't know how to use the security built into the database engine and do not want to learn how to.

    You'll waste more time (yours and your users) and hinder your database design thru attempts to work around limitations that you've unintentionally added to your DB design by using your own security mechanism. It takes lees effort to learn how to use the systems built in security then implemt you r own.

    At our work we deal with 3 different applications that use SQKL Server and each uses its own cluster F of a mess security process because the developers obviously had no clue on How security in SQL Server worked. Very frustrating.

  32. Anonymous says:

    Over all good post but not agreed by point no 4, some time prefixes or suffixes are helpful in querying Database or make queries more understandable at least some sort of pattern must be used to get the idea is it View, SP, Function, OR Table

  33. Anonymous says:

    #19 on normalization/denormalization focuses too narrowly on performance. Your logical data model should be highly normalized, because the purpose is to document and reflect the business understanding of the world. If denormalization is being considered, the critical question to be asked is "am I starting to lie about the world". Fast but wrong is not usually an acceptable solution.

    That said, in many circumstances enterprise data integration may allow substantial denormalization, for example in a data warehouse, because the system of record can be relied on to manage data integrity. Most normalization steps are intended to prevent anomalies due to inserts or updates, so if those never happen (e.g. a warehouse) then denormalization may be safe.

  34. Warehouse is different things in database design, They are totally differ. De-normalization is focus only logical data model.

  35. Anonymous says:

    Thanks for the post. Here’s a tutorial that shows how you can easily build an online database-driven web application with a parent-child table relationship, without codinghttp://blog.caspio.com/web-database/creating-one-to-many-relational-datapages/

  36. My 2 cents on
    1) Don't forget to create index on column used for searching.
    2) Don't put index on all columns it will make update and insert slow.
    3) Always have primary key in your table.
    4) don't over normalize it 3NF is good enough.

    See some of my Database performance tips for Java application here.

  37. yarrays says:

    Thanks for the post. Here’s a tutorial that shows how you can easily build an online database-driven web application with a parent-child table relationship, without codinghttp://blog.caspio.com/web-database/creating-one-to-many-relational-datapages/

Leave a Reply