Data management is no longer just to store and manage data, but to transform into various data management methods that users need. There are many types of databases, from the simplest tables with various data to large-scale database systems that can store massive data, which have been widely used in all aspects.
In the information society, the full and effective management and utilization of various information resources is the premise of scientific research and decision-making management. Database technology is the core part of various information systems such as management information system, office automation system and decision support system, and it is an important technical means of scientific research and decision management.
Definition of database:? Definition 1: A database is a warehouse built on computer storage devices, which organizes, stores and manages data according to the data structure.
Simply put, it can be regarded as an electronic filing cabinet-a place to store electronic documents. Users can add, intercept, update and delete data in documents.
In the daily work of economic management, it is often necessary to put some relevant data into such a "warehouse" and handle it according to the needs of management.
For example, the personnel department of enterprises and institutions often stores the basic information of employees (job number, name, age, gender, place of origin, salary, resume, etc.). ) in a table, can be regarded as a database. With this "data warehouse", we can query the basic situation of an employee at any time as needed, and we can also query the number of employees whose wages are within a certain range, and so on. If all these tasks can be done automatically on the computer, then our personnel management can reach a very high level. In addition, in financial management, warehouse management and production management, it is necessary to establish many such "databases" in order to realize the automatic management of finance, warehouse and production by computer.
Definition 2:
Strictly speaking, a database is an organized and accessible collection of data stored in a computer for a long time. The data in the database is organized, described and stored in a certain data model, which has the characteristics of minimum redundancy, high data independence and easy expansion, and can be shared by multiple users in a certain range.
This kind of data set has the following characteristics: it is as non-repetitive as possible and serves a variety of applications of a specific organization in an optimal way. Its data structure is independent of the application program that uses it, and the addition, deletion, modification and retrieval of data are managed and controlled by unified software. From the development history, database is an advanced stage of data management, which was developed by file management system. [ 1]? [2]?
Database processing system: A database is a general data processing system of a unit or an application field, which stores a collection of relevant data belonging to enterprises, institutions, groups and individuals. The data in the database is established from a global perspective, and is organized, described and stored according to a certain data model. Its structure is based on the natural relationship between data, so it can provide all necessary access paths. The data is no longer aimed at an application, but at the whole organization, and has the overall structural characteristics.
The data in the database is established for many users to enjoy their information, and has got rid of the restrictions and constraints of specific programs. Different users can use the data in the database according to their own purposes; Multiple users can enjoy the data resources in the database at the same time, that is, different users can access the same data in the database at the same time. The enjoyment of data not only meets the requirements of users for information content, but also meets the requirements of information exchange between users.
Basic structure of the database: The basic structure of the database is divided into three levels, reflecting three different perspectives of observing the database.
A database composed of internal patterns is called a physical database; The data composed of conceptual patterns is called conceptual database. A database composed of external patterns is called a user database.
(1) physical data layer.
It is the innermost layer of the database and a collection of data actually stored on the physical storage device. These data are original data, which are processed by users and consist of bit strings, characters and words processed by instruction operations described by internal modes.
⑵ Conceptual data layer.
It is the middle layer of the database and the overall logical representation of the database. It is pointed out that the logical definition of each data and the logical relationship between data are a collection of storage records. It involves the logical relationship of all objects in the database, not their physical conditions, and is a database under the concept of database administrator.
(3) User data layer.
It is a database that users see and use, representing a data set used by one or some specific users, that is, a logical record set.
The relationship between different levels of databases is transformed by mapping.
The main feature of this database is that (1) realizes data sharing.
Data sharing includes that all users can access the data in the database at the same time, and users can use the database in various ways through the interface and provide data sharing.
(2) Reduce data redundancy.
Compared with the file system, because the database realizes data sharing, users are prevented from creating application files separately. Reduce a large number of duplicate data, reduce data redundancy and maintain data consistency.
(3) data independence
The independence of data includes logical independence (the logical structure of database and application is independent of each other) and physical independence (the change of physical structure of data does not affect the logical structure of data).
(4) Centralized data control.
In the file management mode, the data is in a decentralized state, and different users or the same user do nothing with their own files in different processes. Database can be used to centrally control and manage data, and data model can be used to express the organization of various data and the relationship between data.
5] data consistency and maintainability, to ensure the safety and reliability of data.
It mainly includes: ① security control: preventing data loss, wrong update and unauthorized use; ② Integrity control: ensure the correctness, validity and compatibility of data; ③ Concurrency control: Multiple accesses to data are allowed in the same time period, which can prevent abnormal interaction between users.
[6] Fault recovery
A set of methods provided by database management system can find and repair faults in time, thus preventing data from being destroyed. The database system can recover the faults occurred during the operation of the database system as soon as possible, which may be physical or logical errors. Such as data errors caused by system misoperation.
Data type of database: Databases are usually divided into three types: hierarchical database, network database and relational database. Different databases are connected and organized according to different data structures.
1. data structure model
(1) data structure
The so-called data structure refers to the organizational form of data or the relationship between data.
If d represents data and r represents a set of relationships between data objects, then DS=(D, r) is called a data structure.
For example, there is a telephone directory in which the names and corresponding telephone numbers of N people are recorded. In order to find someone's phone number conveniently, names and numbers are arranged in dictionary order, and the corresponding phone number is followed by the name. In this way, if you want to find a person's phone number (assuming that the first letter of his name is Y), you only need to find those names that start with Y. In this example, the data set D is the name and phone number, and the relationship R between them is arranged in dictionary order, and its corresponding data structure is DS=(D, R), which is an array.
⑵ Data structure type
Data structure is divided into logical structure and physical structure of data.
The logical structure of data is to observe and analyze data from a logical point of view (that is, the connection and organization of data), regardless of the storage location of data; The physical structure of data refers to the structure of data stored in the computer, that is, the realization form of the logical structure of data in the computer, so the physical structure is also called storage structure.
Only the logical structure of data is studied here, and the method of reflecting and realizing data connection is called data model.
At present, there are three popular data models, namely hierarchical structure model and network structure model based on graph theory and relational structure model based on relational theory.
2. Hierarchical, mesh and relational database system
(1) hierarchy model
The hierarchical structure model is essentially a directed ordered tree with root nodes (mathematically, "tree" is defined as an acyclic connected graph). The picture below shows the organizational structure of an institution of higher learning. This organization chart is like a tree. School departments are roots (called root nodes), departments, majors, teachers and students are branches (called nodes), the connection between roots and branches is called edges, and the ratio of roots to edges is 1:N, that is, there is only one root and n branches.
The database system established according to hierarchical model is called hierarchical model database system. Ims (Information Management System) is its typical representative.
⑵ Network structure model
The database system established according to the mesh data structure is called mesh database system, and its typical representative is DBTG(Database Task Group). Grid data structure can be transformed into hierarchical data structure by mathematical methods.
⑶ relational structure model
Relational data structure simplifies some complex data structures into simple binary relations (that is, two-dimensional table form). For example, the employee relationship in a certain unit is a binary relationship.
A database system composed of relational data structures is called a relational database system.
In a relational database, almost all data operations are based on one or more relational tables, and data management is realized by classifying, merging, connecting or selecting these relational tables.
DBASE is a typical representative of this kind of database management system. For a practical application problem (such as personnel management), sometimes a variety of relationships are needed to achieve it. A relationship established with dBASE is called a database (or database file), and multiple databases established corresponding to multiple relationships are called database systems. Another important function of dBASE is to use and manage the database by creating command files. The command sequence file corresponding to the database system is called the database application system.
So simply speaking, a relationship is called a database, and several databases can form a database system. Database system can derive various types of auxiliary files and establish its application system.
A Brief History of Database Development: 1 Technical Development of Database
With the increase of data processing ability after using computers, data management technology came into being. The development of data management technology is closely related to computer hardware (mainly external memory), system software and computer application scope. The development of data management technology has gone through four stages: manual management, file system, database and advanced database technology.
2 the birth of data management
The history of database can be traced back to fifty years ago, when data management was very simple. Classify, compare and tabulate through a large number of machines, run millions of punched cards to process data, and print the running results on paper or make new punched cards. Data management is the physical storage and processing of all these punched cards. However, in 1950, a computer named Univac I of Remington Rand Company introduced a tape drive that could input hundreds of records in one second, which triggered a revolution in data management. 1956 IBM produced the first disk drive-305 RAMAC. This drive has 50 disks, each with a diameter of 2 feet, and can store 5MB of data. The biggest advantage of using disk is that it can access data randomly, while punched cards and magnetic tapes can only access data sequentially.
195 1: Univac system uses magnetic tape and punched cards as data storage.
The germination of database system appeared in the 1960s. At that time, computers began to be widely used in data management, which put forward higher and higher requirements for data enjoyment. The traditional file system can no longer meet people's needs, so the Database Management System (DBMS) which can manage and share data in a unified way came into being. Data model is the core and foundation of database system, and all kinds of DBMS software are based on some data model. Therefore, according to the characteristics of data model, traditional database systems are usually divided into three categories: mesh database, hierarchical database and relational database.
The earliest mesh DBMS is IDS (Integrated Data Storage) developed by Bachman and others of General Electric Company of the United States in 196 1 year. 1964, Charles Bachman of American General Electric Company successfully developed the world's first mesh DBMS, that is, the first database management system-Integrated Data Storage IDS, which laid the foundation of mesh database and was widely distributed and applied at that time. IDS has the characteristics of data mode and log, but it can only run on GE host. The database has only one file, and all tables in the database must be generated by manual coding. Later, BF goodrich Chemical Company, a customer of GE, finally had to rewrite the whole system and named the rewritten system Integrated Data Management System (IDMS).
The mesh database model can naturally simulate hierarchical and non-hierarchical things. Before the emergence of relational database, mesh DBMS was more widely used than hierarchical DBMS. In the history of database development, mesh database occupies an important position.
Hierarchical DBMS appears after the network database. The most famous and typical hierarchical database system is IMS (Information Management System) developed by IBM in 1968, which is a hierarchical database suitable for its mainframe. This is the earliest large-scale database system program product developed by IBM. It came into being in the late 1960s, and now it has developed to IMSV6, providing support for advanced features such as clustering, N-way data sharing and message queue sharing. This 30-year-old database product plays a new role in today's WWW application connection and business intelligence application.
1973, Cullinane Company (later Cullinet Software Company) began to sell the improved version of IDMS of Goodrich Company, and gradually became the largest software company in the world at that time.
The origin of the relationship between databases: mesh database and hierarchical database solve the problem of data set and sharing, but they still lack the independence and abstraction of data. When users access these two databases, they still need to make clear the storage structure of data and point out the access path. The relational database that appeared later solved these problems well.
1970, Dr. E.F.Codd, a researcher at IBM, published a paper entitled "Relational Model of Data in Large Shared Database" in ACM's Journal of Communication, and put forward the concept of relational model, laying a theoretical foundation for relational model. Although Childs proposed a set-oriented model in 1968, this paper is generally regarded as an epoch-making milestone in the history of database systems. Codd's wish is to build a beautiful data model for the database. Later, Codd published many articles one after another, discussing the paradigm theory of measuring relational system and 12 standard, and laid the foundation of relational database with mathematical theory. The relational model has a strict mathematical foundation, a high degree of abstraction, simplicity, and easy understanding and use. But at that time, some people thought that relational model was an idealized data model, and it was unrealistic to use it to realize DBMS, especially worried that the performance of relational database was unacceptable, and some even regarded it as a serious threat to the ongoing mesh database normalization. In order to promote the understanding of the problem, 1974 ACM took the lead in organizing a seminar, at which there was a debate between the two factions for and against relational databases, led by Codd and Bachman respectively. This famous debate promoted the development of relational database, and finally made it the mainstream of modern database products.
1969 Edgar ·CODD invented the relational database.
After the relational model of 1970 was established, IBM added more researchers to the San Jose laboratory to study this project, which is the famous system R. Its goal is to demonstrate the feasibility of a fully functional relational DBMS. The project ended in 1979, and the first DBMS to realize SQL was completed. However, IBM's commitment to IMS prevented System R from being put into production. It was not until 1980 that System R was officially put into the market as a product. There are three reasons for the slow pace of IBM's productization: IBM attaches importance to reputation and quality, and minimizes failures; IBM is a big company with a huge bureaucracy. IBM already has a hierarchical database product, and the relevant personnel are not active or even opposed.
At the same time, however, in 1973, michael stonebraker and Eugene Wong of the University of California, Berkeley began to develop their own relational database system, Ingres, using the data published by System R. The Ingres project they developed was finally commercialized by manufacturers such as Oracle and Ingres in Silicon Valley. Later, both System R and Ingres won the "Software System Award" of 1988 issued by ACM.
From 65438 to 0976, Honeywell developed the first commercial relational database system-Multics relational data storage. Relational database system is based on relational algebra. After decades of development and practical application, this technology is becoming more and more mature and perfect. Its representative products are Oracle, DB2 of IBM, MS SQL Server of Microsoft, Informix, ADABAS D and so on.
Database development stage: The database development stage can be roughly divided into the following stages: manual management stage, file system stage, database system stage and advanced database stage.
Manual management stage
Before the mid-1950s, the software and hardware of computers were not perfect. The hardware storage devices are only tapes, cards and paper tapes, and there is no operating system on the software. At that time, computers were mainly used for scientific calculation. At this stage, because there is no software system to manage data, programmers should not only specify the logical structure of data, but also design its physical structure, including storage structure, access methods, input and output methods and so on. When the physical organization or storage device of data changes, the user program must be reprogrammed. Because data organization is application-oriented, different computing programs can't share data, so there are a lot of duplicate data between different applications, and it is difficult to maintain the consistency of data between applications.
The main features of this stage can be summarized as follows:
(1) There is no software supporting data management in the computer, and the computer system does not provide the function of managing user data. An application only contains all the data it needs. When writing a program, users must fully consider relevant data, including data definition, storage structure and instant access method. Program and data are an inseparable whole. Without programs, data has no value and data has no independence.
(2) The data cannot be enjoyed. Different programs have their own data. Usually, different programs have different data and cannot share them. Even if different programs use the same set of data, these data cannot be shared. The program still needs to add this set of data separately, and no part can be omitted. Because this kind of data can't be enjoyed, it will inevitably lead to a lot of duplicate data between programs and waste storage space.
(3) Data cannot be saved separately. The logical structure and physical structure of data should be specified in the program, and data and program are not independent. Based on the fact that data and program are a whole, data is only used by this program, and data is valuable only if it is saved with the corresponding program, otherwise it is useless. Therefore, the data of all programs will not be saved separately. The way of data processing is batch processing.
File system stage: the main sign of this stage is that the computer has the software to manage the database-operating system (file management).
From the mid-1950s to the mid-1960s, due to the appearance of large-capacity direct storage devices such as hard disks and magnetic drums,
It promotes the development of software technology, and there are operating systems and advanced software in the software field. The file system in the operating system is data management software for managing external storage, and the operating system provides a friendly interface for users to use files. The appearance of operating system indicates that data management has entered a new stage. In the file system stage, data is stored in external memory in units of files and managed by the operating system. File is an important resource for operating system management.
Data management in the file system stage has the following characteristics:
superiority
(1) data can be stored in the form of "files" on external disks for a long time. With the computer application turning to information management, a lot of operations are needed, such as querying, modifying and inserting files.
(2) The logical structure of data is different from the physical structure, and the program is separated from the data, which makes the data and the program independent to some extent, but relatively simple. The logical structure of data refers to the form of data structure presented to users. The physical structure of data refers to the actual storage structure of data on computer storage devices. There is "device independence" between degree and data, that is, the program can process data only by using the file name, regardless of the physical location of the data. The access method (read/write) is provided by the file system of the operating system.
(3) The literature organization is diversified. There are index files, linked files and direct access files. However, the documents are independent of each other and lack of contact. The relationship between data needs to be built through programs.
(4) Data no longer belongs to a specific program and can be reused, that is, data is application-oriented. However, the design of the file structure is still based on a specific purpose, and the program is based on a specific physical structure and access mode, so the dependency between the degree and the data structure has not fundamentally changed.
(5) The user's programs and data can be stored in external memory respectively, and each application can enjoy a set of data, thus realizing a file system that shares data in units of files.
(6) Data manipulation is based on records. This is because the file only stores data, not the structural description information of the file record. The establishment, access, query, insertion, deletion and modification of files must be realized through programs.
(7) Data processing methods include batch processing and online real-time processing.
disadvantaged
Although the file system has played a great role in improving the ability of computer data management, with the expansion of data management scale, the amount of data has increased sharply, and the file price system has also shown some defects. These questions are as follows:
(1) data file is designed to meet the special needs of a department in a specific business field. Data and programs are interdependent, and data lacks sufficient independence.
(2) There is no centralized management mechanism for data, so its security and integrity cannot be guaranteed, and the data maintenance business is still undertaken by the application;
(3) The organization of data is still program-oriented, and data is highly dependent on programs. The logical structure of data cannot be easily modified and expanded, and every slight change in the logical structure of data will affect the application program; Moreover, there is a lack of connection between documents, which cannot reflect the connection between things in the real world. In addition, the operating system is not responsible for maintaining the connection between files, and the information causes each application to have a corresponding file. If there is a content connection between files, it can only be processed by the application, and it is possible that the same data is stored repeatedly in multiple files. Both of them cause a lot of data redundancy.
(4) The existing data files are difficult to expand and transplant, and it is difficult to add or delete data items to meet the new application requirements.
Database system stage: At the end of 1960s, with the wide application of computers in the field of data management, people put forward higher requirements for data management technology: they hope to organize data for enterprises or departments, reduce data redundancy, provide higher data enjoyment ability, and require higher independence of programs and data. When the logical structure of data changes, it does not involve the physical structure of data, nor does it affect the application program, thus reducing the application. Database technology is developed on the basis of such application requirements.
To sum up, the data management in the database system stage has the following characteristics:
(1) Use data models to represent complex data structures. The data model not only describes the characteristics of data itself, but also describes the relationship between data, which passes through all access paths. Representing natural data connection through all storage paths is the fundamental difference between databases and traditional files. In this way, the data is no longer oriented to one or more specific applications, but to the whole application system. For example, for enterprises or departments, organize data with data as the center to form a comprehensive database for all applications to enjoy.
(2) For the whole application system, the data redundancy is small, easy to modify and expand, and data contribution is realized. Different applications get the required data from the database according to the processing requirements, which reduces the repeated storage of data and is convenient for adding new data structures and maintaining the consistency of data.
(3) The unified management and control of data provides data security, integrity and concurrency control.
(4) The program and data are highly independent. The logical structure and physical structure of data can be very different, and users can manipulate data with simple logical structure without considering the physical structure of data.
(5) With a good user interface, users can develop and use databases conveniently.
The development from file system to database system is a milestone in the information field. In the file system stage, the central problem people pay attention to in information processing is the design of system functions, so program design is dominant; In the database mode, data begins to occupy the central position, and the design of data structure becomes the first concern of information system, while the application program is designed based on the established data structure.
Development trend of database: With the continuous expansion of information management content, various data models (hierarchical model, grid model, relational model, object-oriented model, semi-structured model, etc. ) has emerged, and new technologies (data flow, Web data management, data mining, etc. ) have emerged. Every few years, some international senior database experts get together to discuss the current situation, existing problems and new technology focus that needs to be paid attention to in the future. Similar reports in the past include:1989 Future direction of DBMS research-Laguna beach participants; 1990 database system: achievements and opportunities; 199 1 w. h. Inmon's Building a Data Warehouse; 1995 database.
Common database vendors: 1. SQL Server?
It can only run on windows without any openness, and the stability of the operating system is very important to the database. Windows9X series products are mainly desktop applications, and NT server is only suitable for small and medium-sized enterprises. Moreover, the reliability, security and expansibility of windows platform are very limited. It is not tested like unix, especially when dealing with large databases. ?
2. Oracle Bone Inscriptions?
It can run on all major platforms (including windows). Fully support all industrial standards. Adopt a completely open strategy. Allow customers to choose the most suitable solution. Give full support to developers. ?
3.Sybase ASE?
It can run on all major platforms (including windows). However, due to the low integration between early Sybase and OS, versions below 1 1.9.2 need more OS and DB patches. In the multi-platform mixed environment, there will be some problems. ?
4.DB2?
It can run on all major platforms (including windows). Most suitable for massive data. DB2 is the most widely used enterprise. In the world's 500 largest enterprises, almost 85% use DB2 database server, while in China, the figure of 1997 is only 5%.