The main direction of data mining learning is the algorithm of mining, and what algorithm can get the best result.
In June 2006, the international authoritative academic organization IEEE International Conference on Data Mining (ICDM) selected ten classic algorithms in the field of data mining: C4.5, K-means, SVM, Apriori, EM, PageRank, AdaBoost, KNN, Naive Bayes and Cart.
Data analysis refers to the process of using appropriate statistical analysis methods to analyze a large number of collected data, extract useful information and form conclusions, and conduct detailed research and summary of the data. This process is also the supporting process of quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions.
Data analysis tools:
Excel, as a common analysis tool, can realize basic analysis work, such as Cognos, Style Intelligence, Microstrategy, BO, Oracle and Yonghe Z-Suitebi suite and other domestic products in the field of business intelligence.
Have to say that the difference between data mining and analysis can be divided into the following points:
1, "data analysis" focuses on observing data, and "data mining" focuses on discovering "knowledge law" KDD (knowledge discovery in database) from data;
2. The conclusion of "data analysis" is the result of human intelligence activities, and the conclusion of "data mining" is the knowledge law discovered by machines from learning sets (or training sets and sample sets);
3. The conclusion of "data analysis" is applied to people's intellectual activities, and the knowledge rules discovered by "data mining" can be directly applied to prediction.
4. "Data analysis" can't establish a mathematical model, so it needs manual modeling, while "data mining" directly completes mathematical modeling. For example, the essence of traditional cybernetic modeling is to describe the functional relationship between input variables and output variables. "Data Mining" can automatically establish the functional relationship between input and output through machine learning. According to the "rules" obtained by KDD, given a set of input parameters, a set of outputs can be obtained.