1. quantitative data conversion processing
Because different indicators (variables) have different dimensions and orders of magnitude, in order to compare the data of different dimensions and orders of magnitude, it is necessary to transform the data. This book uses standardized transformation methods to transform the attributes of variables. The columns are concentrated first, and then standardized by standard deviation, that is,
Study on agricultural geological environment and sustainable utilization of agricultural resources in Poyang Lake area
Where:. After conversion, the average value of the data in each column is 0 and the variance is 1. After the standard deviation is used for calibration, it remains relatively stable when the sampling sample changes.
2. Calculate the distance coefficient between indicators (variables)
Distance coefficient is a quantitative index of the proximity between variables or samples. It is a point that passes through each sample as an m-dimensional space (m variables). In this M-dimensional space, the distance is defined, and the points that are close to each other are classified into one category, while the points that are far away are classified into different categories. Commonly used distances include Euclidean distance, absolute distance, Chebyshev distance, Mahalanobis distance, Chi-square distance, oblique space distance and so on. The absolute distance is used here, that is:
Study on agricultural geological environment and sustainable utilization of agricultural resources in Poyang Lake area
3. Calculate the distance between classes
In cluster analysis, classes are usually represented by G, and elements in G are represented by xi, xj, etc. And suppose there are k elements in G .. Dij stands for the distance between xi and xj, and Dpq stands for the distance between Gp class and Gq class. If t is a given threshold, for any xi, xj∈G, there is dij≤T, then G is called a class. There are the shortest distance method, the longest distance method, the class average method, the center of gravity method, the contact method between groups, the sum of squares of deviations and other methods to calculate the distance between classes. This book adopts the longest distance method. Let the distance between the two farthest elements in Gp and Gq be the longest distance between Gp and Gq, expressed by Dce(p, q), and then there is:
Dce(p,q)=max{djk|j∈Gp,∈Gq}
Using the above calculation method, the results obtained after running DPS statistical software are shown in Figure 7-3. As can be seen from Figure 7-3, when the inter-class distance is 14. 16, it can be divided into four categories, namely: 1 category is 3 counties (cities), namely Nanchang County, fengcheng city County and Poyang County; The second category is five counties (cities), namely Xinjian County, Zhangshu City, yugan county City, Jinxian County and Gao 'an City. The third category includes seven counties, namely Anyi County, Hukou County, Yongxiu County, De 'an County, Jiujiang County, pengze county County and Xing Zi County. The fourth category is six counties (cities), namely duchang county, Fengxin, dongxiang county, yujiang county, Wannian and leping city. Lushan District and Xunyang District of Jiujiang City are merged into three categories according to their geographical location; Donghu district, Xihu District, qingyunpu district, Wanli District and Qingshan Lake District of Nanchang City were merged into 1 class; Linchuan District of Fuzhou City was merged into the fourth category.
Figure 7-3 Cluster Diagram of Counties (Cities)