What are the big data mining methods?

Direct data mining: The goal is to use available data to build a model, which describes the remaining data and a specific variable (which can be understood as the attribute of the table in the database, that is, the column).

Indirect data mining: do not choose specific variables in the target, but describe them with models; But to establish a relationship between all variables.

Data mining method

Neural network method

Neural network has attracted more and more attention in recent years because of its good robustness, self-organization and adaptability, parallel processing, distributed storage and high fault tolerance, which is very suitable for solving data mining problems.

genetic algorithm

Genetic algorithm is a random search algorithm based on biological natural selection and genetic mechanism, and it is a bionic global optimization method. Genetic algorithm is applied to data mining because of its implicit parallelism and easy combination with other models.

Decision tree method

Decision tree is a commonly used algorithm in forecasting model. By purposefully classifying a large number of data, some valuable potential information can be found. Its main advantages are simple description and fast classification, and it is especially suitable for large-scale data processing.

Rough set method

Rough set theory is a mathematical tool to study imprecise and uncertain knowledge. Rough set method has several advantages: it does not need to give additional information; Simplify the expression space of input information; The algorithm is simple and easy to operate. The object of rough set processing is an information table similar to a two-dimensional relational table.

Cover positive examples and reject counterexamples.

It uses the idea of covering all positive examples and rejecting all counterexamples to find the law. First, choose a seed from the positive example set and compare it with the counter-example set one by one. If it is compatible with a selector consisting of field values, it will be discarded, otherwise it will be retained. According to this idea, if we cycle all the positive examples, we will get the rules of positive examples (selector conjunctive formula).

Statistical analysis technology

There are two kinds of relationships between database field items: functional relationship and correlation relationship, which can be analyzed by statistical methods, that is, using statistical principles to analyze the information in the database. The commonly used statistical methods include regression analysis, correlation analysis and difference analysis.

Fuzzy set method

That is, fuzzy set theory is used to make fuzzy evaluation, fuzzy decision, fuzzy pattern recognition and fuzzy cluster analysis on practical problems. The higher the complexity of the system, the stronger the fuzziness. General fuzzy set theory uses membership degree to describe one kind or another of fuzzy things.

Data mining task

Associative analysis

There is a certain regularity between the values of two or more variables, which is called correlation. Data association is an important discovery knowledge in database. Correlation is divided into simple correlation, time series correlation and causal correlation. The purpose of association analysis is to find out the hidden association network in the database. Generally, two thresholds of support and credibility are used to measure the relevance of association rules, and parameters such as interest and relevance are continuously introduced to make the mined rules more in line with the requirements.

Cluster analysis

Clustering is to divide data into several categories according to similarity. The data in the same category are similar to each other, but the data in different categories are different. Cluster analysis can establish macro concepts and discover possible relationships between data distribution patterns and data attributes.

classify

Classification is to find out the concept description of a category, which represents the overall information of this kind of data, that is, the connotation description of this kind of data, and use this description to construct a model, usually expressed by rules or decision trees. Classification is to obtain classification rules through a certain algorithm and using training data sets. Classification can be used for rule description and prediction.

predict

Prediction is to use historical data to find out the law of change, establish a model, and predict the types and characteristics of future data from this model. Prediction is related to accuracy and uncertainty, and is usually measured by prediction variance.

Time series pattern

Time series pattern refers to the pattern with high repetition probability found in time series. Like regression, it uses known data to predict future values, but the difference between these data is the time when the variables are located.

Deviation analysis

Deviation contains a lot of useful knowledge and there are many anomalies in the data in the database. It is very important to find the data anomalies in the database. The basic method of deviation test is to find out the difference between observation results and reference values.

Xu Ruiyun, the first female doctor of mathematics in China, once taught three academicians.

How to find the normal vector of a plane?

Average score of Yunnan college entrance examination mathematics 17

Ask a math problem!

The complete works of the first volume of the first grade picture book.

What is the total score of Taizhou senior high school entrance examination?

What math games are there? Please find three or four before you talk about the rules.

Small class math biscuit shop teaching plan

How about the book "Mathematical Thought in Hongdu Primary School"

The main process of solving math problems in junior high school is best photographed.