Current location - Training Enrollment Network - Mathematics courses - How to group data according to the classification results of index set of system clustering?
How to group data according to the classification results of index set of system clustering?
Principal component analysis is to transform multiple indicators into a few comprehensive indicators, and use the comprehensive indicators to explain the multivariate variance-covariance structure. The comprehensive index is the main component. The obtained principal components should retain as much information as possible about the original variables and are not related to each other. Factor analysis is a multivariate statistical analysis method, which studies how to condense many original variables into a few factor variables with the least information loss and make the factor variables more interpretable. Cluster analysis is to group and classify a large number of experimental data according to their qualitative or quantitative characteristics, so as to understand the internal structure of data sets and describe the process of each data set. The main basis is that samples collected in the same data set should be similar to each other, while samples belonging to different groups should be similar enough. There are differences and connections among the three analytical methods. This paper tries to compare their similarities and differences and explain their relationship in practical application, so as to make better use of these advanced statistical methods for research. Second, the similarities and differences of basic ideas (1) * * The same principal component analysis and factor analysis all use a few variables (factors) to comprehensively reflect the main information of the original variables (factors). Although there are fewer variables than the original variables, the information contained in them accounts for more than 85% of the original information, so even if a few new variables are used, the reliability is high and the problem can be effectively explained. And the new variables are not related to each other, eliminating multiple * * * linearity. The new variables obtained by these two analysis methods are not the remaining variables after screening the original variables. In principal component analysis, the new variables finally determined are linear combinations of the original variables, such as x 1, x2, ..., x3. After coordinate transformation, the original P related variables xi are linearly transformed, and each principal component is obtained by the linear combination of the original P variables. Among many principal components Zi, Z 1 accounts for the largest proportion in variance, indicating that it has the strongest comprehensive ability to the original variables, and the later the principal component with the smaller proportion of variance, the weaker its comprehensive ability to the original information. Factor analysis uses several common factors to explain the complex relationship among many variables to be observed. Instead of recombining the original variables, it decomposes the original variables into two parts: common factors and special factors. Common factors are several factors common to all variables. Special factors are the factors that each original variable has separately. By calculating the scores of the newly generated principal component variables and factor variables, we can use the principal component scores or factor scores instead of the original variables for further analysis. Because the principal component variables and factor variables are much less than the original variables, they play a role in reducing the dimension and reduce the difficulty of our data processing. The basic idea of cluster analysis is: using multivariate statistical values to quantitatively determine the relationship between them, considering the relationship and leading role of multiple factors of the object, and dividing them into different categories according to the difference of their proximity, so that the classification is more objective and practical, and can reflect the inherent and inevitable relationship of things. That is to say, cluster analysis regards the research object as many points in a multidimensional space and reasonably divides it into several categories, so it is a method of grouping and clustering step by step according to the similarity between variable domains, which can objectively reflect the internal combination relationship between these variables or regions [3]. Cluster analysis is a mathematical analysis method to explore the correlation through a large symmetric matrix. It is a multivariate statistical analysis method, and the analysis result is clustering. After vector clustering, the difficulty of data processing naturally decreases, so in a sense, clustering analysis also plays a role in dimensionality reduction. (2) Differential principal component analysis is an analytical method to study how to explain the variance-covariance structure of multivariate variables through a few principal components, that is, to find a few principal components (variables) so that they can keep as much information about the original variables as possible and are irrelevant to each other. It is a mathematical transformation method, that is, a given set of variables is transformed into a set of unrelated variables through linear transformation (pairwise correlation coefficient is 0, or random variables with perpendicular sample vectors). In this transformation, the total variance (sum of variance) of variables remains unchanged and has the largest variance, which is called first principal component; It has a second variance, called the second principal component. And so on. If * * * has p variables, in practical application, it is generally not to find p principal components, but to find m (m