Current location - Training Enrollment Network - Books and materials - Who knows the knowledge of data management?
Who knows the knowledge of data management?
Data management? catalogue

1 definition

2 management stage

1. Manual management stage 2, file system stage 3 and database system stage 3 are application-oriented.

Data management concept of data application Data management object of data application 4 Anti-money laundering

5AML

Fields used for comparison in anti-money laundering plan? 1 definition

Data management [1] is a process of effectively collecting, storing, processing and applying data by using computer hardware and software technology. Its purpose is to give full play to the role of data. The key to effective data management is data organization. With the development of computer technology, data management has gone through three stages: manual management, file system and database system. The data structure established in the database system describes the internal relationship between data more comprehensively, which is convenient for data modification, update and expansion, while ensuring the independence, reliability, security and integrity of data, reducing data redundancy, thus improving data enjoyment and data management efficiency. 2 management stage 1, manual management stage Before the mid-1950s, computers were mainly used for scientific calculation, and the main characteristics of data management at this stage were: (1), and data was not saved. Because computers were mainly used for scientific calculation at that time, they generally didn't need to save data for a long time. They just needed to input data when calculating a certain topic, and then take it back when it was used up. This is true not only for user data, but also for system software sometimes. (2) Application management data. Data needs to be designed, interpreted and managed by the application itself, and there is no corresponding software system to manage data. (3), the data is not * * *. Data is application-oriented, and a set of data can only correspond to one program, so there is a lot of redundancy between programs. (4) The data is not independent. After the logical structure or physical structure of data changes, the application program must be modified accordingly, which increases the burden on programmers. Second, the file system stage is from the late 1950s to the mid-1960s. At this time, there are already direct access storage devices such as disks and drums in the hardware. In terms of software, there is special data management software in the operating system, which is generally called file system; This processing method can be used not only for batch processing, but also for online real-time processing. Using file system to manage data has the following characteristics: (1), data can be saved for a long time. Because a large amount of data is used for data processing, it needs to be kept in external memory for a long time, so that it can be queried, modified, inserted and deleted repeatedly. (2) Data is managed by the file system. At the same time, the file system also has some shortcomings, mainly the poor enjoyment and redundancy of data. In the file system, a file basically corresponds to an application, that is, the file is still application-oriented. When different applications have some of the same data, they must also create their own files, but they can't * * * share the same data, which makes the data redundant and wastes storage space. At the same time, due to the repeated storage and separate management of the same data, it is easy to cause data inconsistency and bring difficulties to data modification and maintenance. Thirdly, since the late 1960s, the object scale of computer management has become larger and larger, the application scope has become wider and wider, and the amount of data has increased dramatically. At the same time, there is a growing demand for multiple applications and languages to share data sets, and database technology comes into being. Using database system to manage data has obvious advantages over file system. From file system to database system, it marks a leap in database management technology. 3 application-oriented. The data management mentioned above has gone through three stages: manual management, file management and database management. It is mainly a process of effectively collecting, storing, processing and applying data by using computer hardware and software technology. With the progress of information technology, management information system will provide business support for large organizations, covering not only all kinds of businesses of the whole organization, but also the whole organization (global or national). Therefore, as the core function of management information system, data management will enter a new stage, that is, data application-oriented data management. The concept of data management is data application data management, that is, the management of data resources. According to the definition of en:DAMA: "Data resource management is committed to developing appropriate structures, strategies, practices and procedures to deal with the enterprise data life cycle". This is a high-level, broad definition, which does not necessarily directly involve the specific operation of data management (from Wikipedia). Compared with the definition of Baidu Encyclopedia, Baidu Encyclopedia aims at data management in the process of data application, that is, traditional data management, while Wikipedia aims at the management of application process data involved in the whole life cycle of enterprise data, that is, the management of data change, or the management of data (metadata) that describes data, which we call application-oriented data management here. According to the management theory, a team of several people can rely on self-consciousness, self-discipline, dozens of people need someone to manage, hundreds of people need a team to manage, and thousands or tens of thousands of people must rely on computer-aided team management. Enterprises and institutions that usually cover the whole country are divided into headquarters institutions, provincial institutions, municipal institutions and grassroots institutions. There are also management and functional departments directly engaged in the corresponding business and those indirectly engaged in business (such as personnel, office, logistics, audit, etc.). ) at every level of the organization; Each department is composed of several employees as the management object. At the same time, a series of systems have been formulated to regulate and restrict the activities and behaviors of management objects such as institutions, departments and personnel. Similarly, data management With the increase of management object-data, the management mode (stage) will also be improved. Generally, the whole project of large-scale management information system is divided into general integration, subprojects and subprojects, and each subproject has several internal project groups and so on. Each management level involves business functions that directly serve the business (such as business transaction, accounting treatment, administrative management, result display, etc.). ) and non-business functions that do not directly serve the business (such as definition, configuration, monitoring, analysis, recording, scheduling, etc.). ); Each business and non-business function consists of several data sets as objects (such as processes, forms, data items, algorithms, metadata, logs, etc.). ). At the same time, it is necessary to formulate a series of systems, rules and standards to restrict the activities and changes of management objects such as projects, functions and data. It can be seen that traditional data management focuses on data objects such as processes, forms, data items and algorithms that directly face specific business needs; Data objects involved in application-oriented data management also include data describing application objects, such as processes, forms, data items and algorithms (that is, their corresponding metadata), files recording various data change results, logs recording running status and other indirect business-oriented data, so as to realize the management of processes of loading, changing, recording and reusing various application business requirements. See the data space diagram below.

Data management object oriented to data application. Data objects managed by data management oriented to data applications are mainly metadata describing the attributes of application system components, including processes, files, archives, data elements (items), codes, algorithms (rules, scripts), models, indicators, physical tables, ETL processes, running state records, etc. Metadata in the usual sense is data about data.

Data), which mainly describes the information of data attributes. This information includes the identification class attributes of data, such as name, identifier, synonym name, context and so on. Technical attributes, such as data type, data format, threshold, measurement unit, etc. Manage class attributes, such as version, registration authority, submission authority, status, etc. Relationship class attributes, such as classification, relationship, constraints, rules, standards, specifications, processes, etc. Metadata involved in data management of data applications mainly describes the properties of those application system components. In addition to the traditional metadata attributes, each different component has its own unique attributes, such as the attributes of participants and links in the process, the attributes deployed in the physical table, the attributes of initiative and target in ETL, and the attributes of algorithms and factors in indicators. Each component must correspond to one or more (different classifications of components) metamodels, which are the standards of metadata, and each metadata should follow the definition of its corresponding metamodel. For example, each data item (meta) has its own name, identifier, data type, data format, publishing status, registration institution and other attributes, and the collection of these attributes is the metadata of this data item. The metadata of each data item is called meta-model because of the constraints of which attributes are described, how each attribute should be described, and the rules of description. E-government data meta-standard (GB/T 19488. 1-2004) is a meta-model of e-government data items. Traditional metadata management usually loads metadata through the extraction function of a special metadata management system after the related business is realized. Because this method needs to manually start the process of loading or maintaining (recording business attributes afterwards), it is often difficult to obtain the changes of metadata in time and ensure the consistency between metadata and actual situation. When realizing application-oriented data management, we should adopt active metadata management mode, that is, following the standards of meta-model, loading metadata (local metadata) through human-computer interaction, and generating configuration or executable scripts of data objects (application system components) at the same time if possible (if the conditions are not met, we should also use the metadata generated by human-computer interaction as the basis for other related tools to generate executable scripts). Whenever it is necessary to change the configuration or modify the script, it is also achieved through this human-computer interaction process, and new metadata is generated synchronously to ensure the consistency between metadata and reality. Active metadata management mode

The meaning and method of data management for data application are shown in the figure below. Traditional application systems are often aimed at specific applications and need to solidify requirements, so it is difficult to support the ever-changing management information system. The third phase of Golden Tax Project is to establish a management information system for national organizations, covering all management businesses and all users of the whole organization. In such an application system, the "change" of business requirements is normal, and the "unchanged" is short-lived; Facing the whole organization, the business "differences" of all departments and levels exist objectively, gradually realize "unification", and then continue to expand (start new differences). Therefore, there must be a new enterprise application system (AS2.0) product set, which can not only provide the realization of business requirements, but also support the changes of business requirements, track and manage their changes, and support the continuous optimization of user experience. AS2.0 must control, record and manage the change process and results of the business requirements of the whole organization. Data management oriented to data application is the product of key basic components of AS2.0 and the foundation of its feasibility. The data management of traditional application system focuses on the value-added process of data, and its function is to pay attention to and emphasize the loading of business requirements, ETL of content, organization, processing and reflection of content. These functions are realized by coding and curing software codes. The data management of AS2.0 focuses on adding metadata collection, historical data collection and status data collection, and uses active metadata management tools to configure and load software codes. At the same time, the corresponding local metadata is collected to form a metadata set, so as to load, capture and record the changes of various business requirements and realize tracking and change management; Standardize and package historical records related to content and changes to form archives, and realize the functions of organizing, reusing and unloading historical data to manage history; Real-time capture, record, comprehensive analysis and timely reflection of the running state information of each component of AS2.0, so as to realize comprehensive management of the running state of the whole system. To sum up, with the expansion of changes, history and state records by data objects, it marks that data management has entered a new stage-data management oriented to data applications, and also marks that application systems have begun to enter the AS2.0 era. 4 Anti-money laundering data management is the core of anti-money laundering. Financial service providers need to know their customers better than ever before. Money laundering is a major issue that many governments consider when fighting crime and terrorism. To this end, they have issued many guidelines for financial services institutions operating within their jurisdiction. Data management is anti-money laundering (AML)

The core of. For example, the third anti-money laundering directive of the European Union and the Patriot Act of the United States both attach great importance to the data quality that needs to be paid attention to and strictly managed in the following areas: customer identification, know your customer (KYC), customer (or enhanced) due diligence, Informatica data quality solutions in anti-money laundering, and Informatica.

& ampregData Quality TM includes a desktop workbench for business and compliance data analysis. Its easy-to-use interface allows users who need a comprehensive understanding of data and business processes to create their own data quality rules to identify potential suspicious or fraudulent behaviors. This easy-to-use function is a key advantage of the enterprise. In short, there is no need to wait for an independent department to formulate and deploy rules, thus increasing the risk of leakage caused by delayed implementation. Nowadays, companies can not only formulate, deploy and centrally manage rules, but also quickly respond to changing business conditions. information data

Quality solutions are used to cross-reference multiple data sets. This cross-reference allows enterprises to identify and verify customer and transaction data according to the following lists: observation list (internal, government and third party), mortality list, politician (PEP) list, suppression list, address data and reference data. Finally, once the rules are established, they can be accessed through it.

Organizations deploy and optimize them, and set them up for regular execution. This kind of automatic inspection can ensure that data is continuously managed by regular and scheduled batch jobs, which is very suitable for continuous customer due diligence (CDD) and special suspicious activity reports. The customer information planning rules of AML enterprises must know customers in detail. The effective operation of sales, marketing and finance departments must have accurate and up-to-date customer data. In the past, various laws and regulations related to data protection required better customer data quality, such as the Bank Secrecy Act (USA) and

HIPAA. However, legislators and regulators have responded to some recent violations through additional compliance measures, including Sarbanes-Oxley Act, the third EU anti-money laundering directive, USA Patriot Act, MiFID and Solvency.

2. Many of these metrics show the integration needs of enterprises in the following areas: data governance, data integration, data storage and warehouse, business intelligence and reporting. Looking at all these rules, it shows the consistent requirements for managing data quality plans. Sometimes, this is an implicit requirement, but in general, the data quality requirement is clear: it is necessary to implement a program that covers all customers: 1. Capture the ID information of all customers; 2. Verify the identity of the customer; 3. Inform the customer of CIP process; 4. Identify the information needed to compare the customer name with the government list (before opening an account): 1. Name 2. Street address (2. Article 326: Customer Confirmation Program The USA Patriot Act manages their Customer Information Program (CIP), and many financial institutions will rely on Informatica's data quality products. Business analysts use role-based data quality.

Define a regular workbench to ensure that the data required by CIP is suitable for its purpose. In general, the following data quality dimensions need to be measured and reported: integrity: ensure that all CIP data are filled in; Compliance: ensure that all CIP data formats are correct; Consistency: analyze multiple attributes to ensure data consistency, such as currency and country, city and country; Repeatability: Does this customer already exist? Authenticity: Is the customer on the PEP list? Is this customer related to employees (KYE)? Is this customer related to other customers? Accuracy: Ensure the validity of CIP data: date, product code, address and scope: Does this transaction exceed a certain amount? Does the amount of currency transactions in the account exceed a certain level? Analysts can use such reports to quickly identify data differences that need attention in risk-based CIP. For example, the country/region of residence, the business nature of customers, the type of accounts or banking products, the quantity and value of transactions, and when customers capture data on the PEP list (for example, open new accounts) or pass batch processing, they can generate highly accurate search results and data quality abnormality reports. Generally, the data quality improvement process is applicable to existing customer or transaction data to improve CDD.

Or promote historical review. You can use Informatica or a third-party reporting engine to submit reports. 5 Anti-money laundering field analysts used for comparison in the anti-money laundering plan can also use Informatica data quality.

Solutions, according to the watch list to fulfill its regulatory requirements, to compare customers. Usually, the fields used for comparison in anti-money laundering procedures include: first name x year of birth, last name x address, gender x identity and date of birth. For example, a business analyst can weight each field separately by focusing on the year of birth rather than the complete date of birth. In this sample report, as shown in Figure 2.

You can generate a list of all potential matches between the customer dataset and the reference dataset, such as an observation list. AML can be triggered by a predetermined weight.

Be alert and let relevant people pay attention to these games. If matching these records in AML reports is not a real match, you can set a flag to prevent future reports from using the same match.

The search and matching process can also be used only to identify and delete duplicate content in the system. Potential duplicate content can be submitted to the data quality administrator through a web-based customer, and the former will review the records with poor quality and select the master record or perfect record with the best data among the duplicate content. CIP is the built-in data quality firewall of AML algorithm and a subset of KYC plan, which ultimately needs the same data quality process. Usually, the data quality administrator will extend the CIP of Informatica data quality solution.

Functions include data related to customers' financial status and investment objectives. Using Informatica's recognized technical methods can improve and verify customers' data, thus helping to achieve KYC goals. Therefore, it not only includes more fraud detection, but also enhances customer relationship data and sales market. In fact, this has a built-in data quality firewall before AML algorithm or third-party engine. These firewalls perform two functions: determining poor quality data; standardizing, cleaning data and/or expanding in time, thus improving the efficiency of AML engines and reducing the occurrence of risks. Identify potential fraud: Use data quality business rules to detect fraud as early as possible before loading data into the engine. Figure 4 shows the mining view of the data quality firewall. It contains the data quality level recorded item by item (column O). From 0% to 100%

These grades are first used to indicate the record containing the worst DQ. Business analysts can easily adjust the weight of these grades according to the specific situation. In addition, the report also emphasizes the risk level of CIP/KYC, which are all applicable to data using Informatica data quality. These risk levels identify specific data scenarios that may lead to fraud, even before the data is loaded into the AML engine. An important part of anti-money laundering procedures: Customer due diligence A comprehensive customer due diligence (CDD) for new and existing customers is an important part of any anti-money laundering procedure. CDD relies on high-quality CIP and

Monitor KYC data and processes for customer relationship improvement, and take its services as part of the overall risk control function. The USA Patriot Act and the Third Anti-Money Laundering Directive all attach great importance to CDD.

Informatica data quality is also very suitable for providing continuous monitoring to meet these requirements. As mentioned above, active data quality management can ensure that over time, CDD

The lower the quality of data used, the higher the efficiency. Chapter ii customer due diligence article 8 1 1. Customer due diligence measurement should include: (a) identifying customers and verifying their identities ... (2)

Where appropriate, identify the beneficiaries and take appropriate risk-based measures to verify their identity, so that the institutions or personnel covered by this Directive can meet the beneficiaries they know, including legal persons, trusts or similar legal arrangements, and take appropriate risk-based measures to understand ownership and control customers.

Structure; (3)

Obtain information about the purpose and purpose of the business relationship; (d) Continuous monitoring of business relations, including transaction review ..... See Figure 6 for an example of the anti-money laundering data quality rules in the third EU anti-money laundering directive. In telemarketing telemarketing, the sales team, products and marketing database constitute the three essential elements of "who will sell", "what to sell" and "who to sell". As the meeting point of target sales, marketing data plays a vital role in telemarketing. How to manage and use these valuable data resources scientifically and normatively should be a problem that every telemarketing manager needs to seriously consider and implement. Let's start from the theory and look at the links that need attention in "data management" in telemarketing!

The first question: data import

Data needs to be processed before import to ensure that it can be maintained, counted and analyzed during application.

First, you need to stop parsing and define the original data properties. Usually in telemarketing, all kinds of data from different channels will be called, and these data have their own characteristics. This makes us need to distinguish similar geographical attributes (local and remote), gender attributes (male and female), age attributes (different age groups), income attributes (high, middle and low income groups), industry attributes (finance, IT industry) and so on. Then, according to these different characteristics, stop classifying and coding the data attributes, and further dispose of these data through telephone sales. Then we can analyze and find the most suitable user group for product sales, thus completing the priority acquisition and selection of data information and realizing the maximum application of data resources.

Secondly, there is a seemingly simple but meaningful job, which is to preprocess the data before importing, and delete some invalid data, such as data with short contact number, data with missing contact number, or data with the same attributes as the target customer group. Because these tasks are arranged before data import, the original data can be processed in batches, and the data that is more in line with the dialing specification can be obtained most efficiently, while ensuring the accuracy and effectiveness of the data allocated to the first-line TSR, saving their time and work efficiency.

Finally, before the data is officially put into use, it is also suggested to stop numbering and backing up the original data. Because once the data is sent to TSR, the maintenance and update of data information will stop from time to time with the promotion of sales work. When you need to view the original information of the data, you need this backup original database. Because the numbering of the original data has stopped in the early stage, at this time we only need to make a simple corresponding query with the data number in the original database.

After the above disposal, we can now import data resources and wait for telemarketing to bring us rich profits!

The second question: the use of data

The processed data is unified and orderly after being imported, which is a good start.

Next, let's take a look at the application process of data. When TSR uses marketing data, it will stop a series of data maintenance, including recording and changing dialing status and sales status. Let's take a look at several dialing States and sales States, and what they mean to us respectively.

Dialing status: Dialing status refers to the connection status of contact information such as telephone number in marketing data after contact. Usually we can label according to the state shown in the figure below.

The data marked with dialing status has a deeper meaning-the vitality of data. The data that can never be connected comes from TSR.

"Cancel" it, and never call it out to take up TSR's time; The data demand of "busy/busy" takes precedence over "wrong number", because this state shows that this phone is still in use, and the possibility of connecting to continue contact will be the greatest! By the way, the data about the need for "continuous contact" should be "dialed the wrong time" The so-called misdialing mainly means staggering working days and non-working days, or staggering the time between day and night. The effective application of data resources can only be realized through the staggered dialing of "working day dialing", "non-working day dialing", "daytime dialing" and "night dialing".

Let's look at the "sales situation". Sales status only refers to three states when the phone is connected and contact data is found:

Victory: the victory of telephone sales needs to be followed up: the contact person needs to think, or the sales are not completed, and the demand for further follow-up is rejected; Contacts do not accept the products or services sold, and telemarketing failure is easy to be identified in the application process of telemarketing. What needs attention here is the attention to the two states of "pending follow-up" and "rejection". Looking at the follow-up data, what are the main factors that cause users to think about their needs? Output, quality and quantity? Product price? Or after-sales service? As long as you master this information, you can be more familiar with data attributes and design sales scripts in a targeted manner to deal with such users whose needs are "to be followed up".

Similarly, we also need to find out the main reasons for users' rejection, and take effective measures to improve the sales success rate after corresponding data attributes.

The third question: the application of data

Experience tells us that data does not need to be evenly distributed to each TSR, because different TSRs use data in different ways. When distributing data, we should stop effective adjustment in real time according to the utilization of data by each TSR.

At this time, there are two parameters that can help us complete the regulation of marketing data: "winning contact rate" and "waiting for follow-up rate". The following are introduced separately.

Successful contact rate = sum of user data contacted/sum of connected data × 100%. Successful contact rate is an index to judge the validity of data. Through the successful contact rate to understand the dialed data, there are several data to find contacts and sales targets. Winning contact rate is a constantly changing state value. The second dial, the third dial, or even more, the winning contact rate will increase. In order to improve the effective application of data to a certain extent, you can set the "minimum contact rate of winning prizes". When the "successful connection rate" of allocated data is lower than the set target value, the allocation of new data will be reduced, and TSR will be required to stop repeatedly dialing "busy tone/busy tone" and "no answer" in disconnected data, so as to improve the "successful connection rate" and apply data more effectively.

Follow-up rate = total data to be followed up/total data contacted with contacts × 100%. According to the formula, it is not difficult to understand that the "follow-up rate" focuses on how much data needs to be followed up among the data that can find contacts. In the process of stopping the control of data distribution, for this indicator, the demand sets the "highest follow-up rate".

Set the Maximum Follow-up Rate. In order to make good use of data resources, stop the secondary sales with the contact person who is thinking in time, and seize the best follow-up opportunity, TSR needs to check the follow-up data regularly and stop the call. When the "highest follow-up rate" is exceeded, it means that there is too much data to follow up in the marketing data called by TSR. At this time, it is necessary to reduce the distribution of new data, so that it can concentrate on following up interested but hesitant sales targets.

By controlling the "successful contact rate" in marketing data, more contacts can be found, and by controlling the "waiting follow-up rate", more successful sales opportunities can be found. Paying attention to these two indicators is an important part of telemarketing "data management"