Current location - Training Enrollment Network - Books and materials - How to design an excellent database (turn)
How to design an excellent database (turn)
A successful management system consists of [50% business +50% software], and 50% successful software consists of [25% database +25% program]. The quality of database design is a key. If the enterprise data is compared to the blood necessary for life, then the design of database is the most important part in the application.

There is a lot of information about database design, and there are also special stories in university degree courses. However, as we have repeatedly stressed, no matter how good a teacher is, he can't compare with the teaching of experience. So I summed up my detours and experiences over the years, and found some accomplished professionals in database design on the Internet to teach you some skills and experiences in database design. Selected 60 best skills and wrote them into this article, which is divided into five parts for the convenience of indexing:

Part 1-Before designing the database: This part lists the basic skills of 12, including naming specifications and defining business requirements. Part 2-Designing Database Tables: There are 24 guiding skills, covering the design of fields in tables and common problems that should be avoided. Part 3-Selecting Keys: How to Choose Keys? There are 10 tips specifically related to the correct use of system-generated primary keys and when and how to index fields for the best performance. Part 4-Ensuring data integrity: Discuss how to keep the database clear and robust, and how to minimize harmful data. Part 5-Skills: There are many other skills that are not included in the above four parts. I hope that with them, your database development will be easier. Part 1- Check the existing environment before designing the database.

When designing a new database, we should not only carefully study the business requirements, but also check the existing systems. Most database projects are not built from scratch; Usually, there will always be existing systems in the organization to meet specific needs (automatic calculation may not be possible). Obviously, the existing system is not perfect, otherwise you don't have to build a new system. But the study of the old system can help you find some subtle problems that may be ignored. Generally speaking, it is absolutely beneficial for you to review the existing system.

Define standard object naming specifications

Be sure to define naming conventions for database objects. For database tables, it is necessary to determine whether the table name is plural or singular from the beginning of the project. In addition, simple rules should be defined for table aliases (for example, if the table name is a word, the alias will take the first four letters of the word; If the table name is two words, take the first two letters of each word to form a four-letter alias; If the table name consists of three words, you might as well take one of the first two words, then two letters from the last word, and the result is still a four-letter alias, and so on. For worksheets, the name of the table can be prefixed with WORK_, followed by the name of the application that uses the table. The columns [fields] in the table should adopt a set of key design rules. For example, if the key is a number, you can use _N as the suffix; If it is a character type, you can use the _C suffix. Column [field] names should use standard prefixes and suffixes. For another example, if there are many "money" fields in your table, you might as well add a _M suffix to each column [field]. Also, the date column [field] should start with D_.

Check the naming convention among table name, report name and query name. You may soon be confused by the names of these different database elements. If you insist on naming the different components of these databases uniformly, you should at least use prefixes such as Table, Query or Report at the beginning of these object names to distinguish them.

If Microsoft Access is adopted, you can identify objects (such as tbl _ employees) with symbols such as qry, rpt, TBL and mod. I also use tbl to index tables when dealing with SQL Server, but I use sp_company (now sp_feft_) to identify stored procedures, because sometimes if I find a better way to deal with it, I often save several copies. When I implemented SQL Server 2000, I used udf_ (or similar tags) to identify the functions I wrote.

a workman must first sharpen his tools if he is to do his work well

Using ideal database design tools, such as PowerDesign of SyBase, supporting languages such as PB, VB and Delphe, we can connect more than 30 popular databases in the market through ODBC, including dBase, FoxPro, VFP and SQL Server. I will focus on the use of PowerDesign in the future.

Get the data schema resource manual

Those who are looking for sample patterns can read the Data Pattern Resource Manual, written by Len Silverston, W. H. Inmon and Kent Graziano, which is the best data modeling book worth owning. This book includes chapters covering various data fields, such as people, institutions and work efficiency.

You can also refer to others: Introduction to Database System by Sashixuan Susan Wang.

Think about the future, but don't forget the lessons of the past.

I find it very useful to ask users what they think of future changes in demand. This can achieve two purposes: first, you can clearly understand where the application design should be more flexible and how to avoid performance bottlenecks; Secondly, you know that users will be as surprised as you when there is a demand change that is not determined in advance.

Be sure to remember the past experience and lessons! We developers should also help each other by sharing our own experiences and experiences. Even if users think they no longer need any support, we should educate them in this respect. We have all faced the moment: "If only we had done this …".

Design logic before physical practice.

Before going deep into physical design, logical design should be carried out. With the emergence of a large number of CASE tools, your design can reach a fairly high level of logic, and you can usually better understand all aspects of database design as a whole.

Know your business

Don't even add a data table to your ER (Entity Relationship) model until you 100% are sure that the system meets its needs from the customer's point of view (why don't you have a model yet? Then please refer to Tip 9). Knowing your business can save a lot of time in the later development stage. Once you know the business requirements, you can make many decisions by yourself.

Once you think you have defined the business content, you'd better have a systematic communication with customers. Use customer terminology and explain to them what you think and hear. At the same time, the relational cardinality of the system should be expressed by words such as possibility, will and must. In this way, you can ask your customers to correct your own understanding, and then do the next ER design.

Create data dictionary and ER chart

Be sure to take some time to create ER charts and data dictionaries. It should at least contain the data type of each field and the primary foreign key in each table. It takes time to create ER charts and data dictionaries, but it is absolutely necessary for other developers to understand the whole design. The earlier it is created, the more it will help to avoid possible confusion in the future, so that anyone who knows the database can clearly understand how to obtain data from the database.

The importance of the latest documents cannot be overemphasized, such as ER chart, which is very useful for showing the relationship between tables, while the data dictionary shows the purpose of each field and any possible aliases. This is absolutely necessary for documents of SQL expressions.

Create mode

A chart is worth a thousand words: developers should not only read and implement it, but also use it to help them communicate with users. Schema helps to improve the efficiency of collaboration, so it is almost impossible to have big problems in the early database design. The pattern need not be complicated; It can even be as simple as writing on a piece of paper. Just to ensure that the logical relationship can produce benefits in the future.

Start with input and output.

When defining database table and field requirements (inputs), you should first examine existing or designed reports, queries and views (outputs) to determine which tables and fields are necessary to support these outputs. For a simple example, if a customer needs a report to sort, segment and sum by postal code, you should ensure that it contains a separate postal code field instead of putting the postal code in the address field.

Reporting skills

To find out how users usually report data: batch processing or online submission? Is the interval daily, weekly, monthly, quarterly or yearly? You can also consider creating a summary table if necessary. The primary key generated by the system is difficult to manage in the report. When searching in a table with a system-generated primary key, users often return a lot of duplicate data. This kind of retrieval has low performance and is easy to cause confusion.

Understand customer needs

It seems that this should be obvious, but the demand comes from customers (from the perspective of internal and external customers here). Don't rely on the needs written by users, the real needs are in the minds of customers. You should ask customers to explain their requirements, and as development continues, you should always ask customers to ensure that their requirements are still in the purpose of development. An unchangeable truth is that "you don't know what you want until you see it" will inevitably lead to a lot of rework, because the database does not meet the requirements standards that customers have never written down. To make matters worse, your explanation of their needs belongs only to you, and it may be completely wrong.

Part 2-Designing Tables and Fields to Check for Changes

When I design a database, I will consider which data fields may change in the future. For example, surnames are like this (pay attention to westerners' surnames, such as women taking their husbands' surnames after marriage, etc. Therefore, when building a system for storing customer information, I tend to store the last name field in a separate data table and add fields such as start date and end date, so that I can track the changes of this data item.

Use meaningful field names

Once I participated in the development of a project with a program inherited from other programmers. Programmers like to name fields with data instructions displayed on the screen, which is not bad, but unfortunately, she also likes to use some strange naming methods, using the combination of Hungarian naming and control serial number, such as cbo 1, txt2, txt2_b and so on.

Unless you are using a special abbreviated field name system, please describe these fields as clearly as possible. Of course, don't overdo it, such as customer _ shipping _ address _ street _ line _1. Although it is very descriptive, no one wants to type such a long name. The specific scale is up to you.

Use prefix naming

If there are many fields of the same type in multiple tables (such as FirstName), you may wish to use the prefix of a specific table (such as CusLastName) to help identify the fields.

The timeliness data should include the "Last Updated Date/Time" field. Timestamps are especially useful for finding the cause of data problems, reprocessing/reloading data by date, and clearing old data.

Standardization and data-driven

The standardization of data is not only convenient for yourself, but also for others. For example, if your user interface wants to access external data sources (files, XML documents, other databases, etc.). ), you might as well store the corresponding connection and path information in the user interface support table. In addition, if the user interface performs tasks such as workflow (sending mail, printing stationery, modifying record status, etc.). ), then the data that generates the workflow can also be stored in the database. Pre-arrangement always requires hard work, but if these processes are data-driven rather than hard-coded, then policy changes and maintenance will be much more convenient. In fact, if the process is data-driven, you can put considerable responsibility on users, who will maintain their own workflow.

Standardization should not be excessive.

For those unfamiliar with the word standardization, standardization can ensure that the fields in the table are the most basic elements, which helps to eliminate data redundancy in the database. There are several forms of standardization, but the third normal form (3NF) is generally considered as the best balance between performance, scalability and data integrity. Simply put, 3NF stipulates that:

Each value in the table can only be represented once. Each row in the table should be uniquely identified (with a unique key). Non-key information that depends on other keys should not be stored in the table. A database conforming to the 3NF standard has the following characteristics: there is a group of tables dedicated to storing relevant data, which are connected by keys. For example, a 3NF database that stores customers and their related orders may have two tables: customer and order. The Order table does not contain any information about the Customer associated with the order, but a key value will be stored in the table, which points to the row containing the customer information in the customer table.

The degree of standardization is also higher, but is the higher the degree of standardization necessarily better? The answer is not necessarily. In fact, for some projects, even 3NF may bring too much complexity to the database.

In order to be efficient, sometimes irregular forms are needed, and there are many examples. There was once a job to develop catering analysis software, which used non-standardized forms to reduce the query time from an average of 40 seconds to about two seconds. Although I have to do this, I will never regard the non-standardization of data tables as a natural design concept. And the specific operation is only a derivative. So if something goes wrong with the table, it is entirely possible to reproduce the non-standardized table.

Inactivity or inactivity indicator

It is useful to add a field to indicate whether the record is no longer valid in the business. Whether it's customers, employees or anyone else, this helps to filter the active or inactive status when you run the query again. At the same time, it also eliminates some problems faced by new users when adopting data. For example, some records may no longer be used, and deleting them can play a preventive role.

Use role entities to define columns [fields] that belong to a category.

When you need to define something that belongs to a specific category or has a specific role, you can use role entities to create a specific time association, thus achieving self-documentation.

The meaning here is not to let the PERSON entity have the Title field, but to say, why not use the PERSON entity and the PERSON_TYPE entity to describe people? For example, when John Smith, an engineer, is promoted to John Smith, a director, and finally climbs to the position of John Smith, a CIO, all you have to do is change the key value of the relationship between the two tables PERSON and PERSON_TYPE, and add a date/time field to know when the change occurred. In this way, the PERSON_TYPE table contains all possible personnel types, such as assistant, engineer, supervisor, CIO or CEO.

Another alternative is to change the personnel record to reflect the change of the new title, but it is impossible to track the specific time of the individual's location in time.

Use public entities to name institutional data.

The easiest way to organize data is to use common names, such as person, organization, address and phone number. When you combine these common common names or create specific corresponding sub-entities, you get a special version for yourself. The reason why common terms were adopted from the beginning is mainly because all concrete users can concretize abstract things.

With these abstract representations, you can use your own special name in the secondary logo. For example, people can be employees, spoons, patients, customers, customers, vendors or teachers. Similarly, the organization can also be my company, my department, competitors, hospitals, warehouses, governments, etc. Finally, the address can be a specific place, location, home, work, customer, supplier, company and field office.

Using general abstract terms to identify the category of "things" can give you great flexibility in associating data to meet business needs, and at the same time, it can significantly reduce the redundancy required for data storage.

Users come from all over the world.

When designing a database with internet or other international characteristics, it must be remembered that most countries have different field formats, such as postal codes, while some countries (such as New Zealand) do not have postal codes.

Data replication requires the use of a separate data table.

If you find yourself entering data repeatedly, please create a new table and a new relationship.

Three useful fields that should be added to each table

DRecordCreationDate, which is Now () by default in VB, GETDATE()sRecordCreator by default in SQL Server and not null by default in SQL Server, is the version mark of the record; It is helpful to accurately explain why there is empty data or missing data in the record. Use multiple fields for address and phone number.

It is not enough to describe the street address with only one line. Address_Line 1, Address_Line2 and Address_Line3 can provide greater flexibility. In addition, phone numbers and e-mail addresses should have their own data tables, with their own types and tag categories.

Be careful of over-standardization, which may cause performance problems. Although the separation of address and telephone list can usually achieve the best state, it may be more appropriate to store "preferred" information (such as customers, etc.) if such information needs to be accessed frequently. ) in its parent table. The compromise between non-standardization and accelerated access makes sense.

Use multiple name fields

I am surprised that many people leave a field as their name in the database. I think only novice developers will do this, but in fact this practice is very common on the Internet. I suggest that the last name and the first name should be treated as two fields, and then merged when querying.

What I often use is to create a calculated column [field] in the same table, through which standardized fields can be automatically connected, so that when the data changes, it will also change. However, you must be smart when using modeling software. In short, the connection field can effectively isolate the user application from the developer interface.

Pay attention to mixed case object names and special characters.

One of the things that annoyed me most in the past was that there were mixed case object names in the database, such as CustomerData. This problem exists in the access to Oracle database. I don't like to use this mixed case object naming method, and the result is to rename it manually. Do you think this database/application can be mixed to the day when a more powerful database is adopted? All uppercase and underlined names are more readable (CUSTOMER_DATA). Never leave spaces between the characters of an object name.

Be careful to keep the words.

Make sure that your field names do not conflict with reserved words, database systems or common access methods. For example, there is a table in my recently written ODBC connection program, in which DESC is used to describe the field name. The consequences can be imagined! DESC is a reserved word after descending abbreviation. A SELECT * statement in the table can be used, but I get a lot of useless information.

Keep the consistency of field names and types.

Be consistent when naming and specifying data types for fields. If the field name in one table is "agreement_number", do not change the name to "ref 1" in another table. If the data type in one table is an integer, do not change it to a character type in another table. Remember that after you finish your work, others will use your database.

Choose the number type carefully.

Be very careful when using smallint and tinyint types in SQL. For example, if you want to see the total monthly sales, and your total field type is small, then if the total exceeds $32,767, you cannot calculate it.

delete flag

Include a Delete Mark field in the table so that rows can be marked for deletion. Do not delete any rows in the relational database; It is best to use a data cleaner and be careful to maintain the integrity of the index.

Avoid using triggers.

The function of the trigger can usually be realized in other ways. When debugging a program, triggers may become interference. If you really need to use a trigger, you'd better focus on recording it.

Include version mechanism.

It is suggested that you introduce version control mechanism into the database to determine the version of the database being used. You must meet this requirement in any case. As time goes by, users' needs will always change. You may need to modify the database structure eventually. Although you can determine the version of the database structure by checking new fields or indexes, I find it more convenient to store the version information directly in the database.

Leave enough margin for the text field.

Text fields of ID type, such as customer ID or order number, should be set larger than usual, because you may be embarrassed to add extra characters in a short time. For example, suppose your customer ID is 10 digits long. Then, you should set the length of the database table field to 12 or 13 characters. Is this a waste of space? A little, but not as much as you think: if a field is extended by three characters, there are 1 10,000 records, plus a little index, it will only make the whole database occupy more space of 3MB. However, this extra space does not need to rebuild the whole database in the future to realize the growth of database scale. The best and most painful example is that the ID number is changed from 15 digits to 18 digits.

Naming skills of columns [fields]

We found that if we use a uniform prefix for the column [field] names of each table, it will be greatly simplified when writing SQL expressions. It does have some shortcomings, such as destroying the function of automatic table connection tools, which link public * * * column [field] names with some databases, but even these tools sometimes connect incorrectly. To give a simple example, suppose there are two tables:

Customers and orders. The prefix of the customer table is cu_, so the sub-segment names in the table are as follows: cu_name_id, cu_surname, cu_initials and cu_address. The prefix of the order table is or _, so the name of the subsection is:

Or_order_id, or_cust_name_id, or_quantity and or_description, etc.

The SQL statement for selecting all data from the database can be written as follows:

1Select*FromCustomer, OrderWherecu _ surname = "MYNAME2 and Cu _ name _ id = or _ cust _ name _ idodor _ quantity =1If there are no prefixes, write it like this (distinguished by aliases):

1Select*FromCustomer,order where customer . surname = " my name ";