However, I can only give a brief introduction here, and the deeper principles may need my special study to express clearly. However, it is not difficult to apply. If some principles are not clearly understood, it should not affect the use. Think of it as knowing a new method.
In fact, the following studies, such as multiple regression analysis, operational research correlation, time series analysis, various forecasting models, clustering classification and so on. , involving many difficult mathematical derivation. Even if I understand and learn something by myself, it will take me longer to express it simply and easily. So at present, when writing study notes, I can only briefly explain the principle, and then talk about that stupid application. When I have a deeper understanding, I will go back and refresh those articles that are not written deeply and clearly.
Ok, let's get down to business, let's talk about grey relational analysis ~
"In the process of system development, if the trends of the two factors are consistent, that is, the degree of synchronous change is high, it can be said that the two factors are highly correlated; On the contrary, it is even lower. Therefore, grey relational analysis is a method to measure the degree of correlation between factors according to the similarity or dissimilarity of their development trends, that is, "grey relational degree". "
The above content is taken from Baidu, which is probably the way it is. The research object of grey relational analysis is often a system. The development of the system will be affected by many factors. We often want to know, among many influencing factors, which are the main factors and which are the secondary factors; Which factors have great influence and which factors have little influence; Which have promoting effect, which have inhibiting effect and so on.
Regression analysis, variance analysis and principal component analysis are commonly used in mathematical statistics to discuss this problem. However, the above methods all have some shortcomings. For example, these methods need a lot of data, and the results are meaningless without data; Sometimes samples are required to obey special distribution, or quantitative results are inconsistent with qualitative analysis. Grey relational analysis can deal with this problem well.
Grey relational analysis has no requirement on the number and regularity of samples (of course, the sample size should not be too small, just two or three samples), and the quantitative results are basically consistent with the qualitative analysis. The basic idea of grey relational analysis is to judge whether a series of curves are closely related according to their geometric similarity. The closer the curve shapes are, the greater the correlation between corresponding sequences, and vice versa.
Well, a simple translation of the above principle is to study the geometric similarity of curves composed of two or more sequences (sequences can be understood as factors or indicators in the system). The more similar they are, the closer their relationship is, that is, the higher their correlation is. Therefore, this method studies the correlation almost from the perspective of pure data. If the curves of two unrelated indicators are very similar in shape, then the gray correlation analysis will think that the two indicators are highly correlated. Of course, this is only an extreme example. For general data or systems, it is reasonable to measure the degree of correlation by the shape of the curve.
First, introduce the first application and its basic application, system analysis. The main content of its analysis is to sort the "factors affecting system development" according to their importance or influence. In terms of grey relational analysis, it is to give the ranking of the correlation degree between each factor and the whole system. The higher the correlation degree, the greater the influence of the corresponding factors on the system development. As for the degree of correlation, it is the approximate degree of the curve shape mentioned above. Well, in fact, grey relational analysis can be vaguely understood, but it feels a bit unreliable hhh.
Let's give an example directly to illustrate the method of applying grey relational analysis. The principle has been explained. )
The following table shows the GDP statistics of a certain region (unit: million yuan). From 2000 to 2005, which industry had the greatest impact on the total GDP of this region?
No, this is a typical system analysis problem, and find out a factor that has the greatest impact on GDP development. So what do we need to do? Think about it, the principle of grey relational analysis is that to compare the geometric similarity of series curves, of course, we must draw a series of curves first. Well, the first step is to draw a sequence curve.
It should be noted here that if we want to study the correlation between various factors and the whole system, we need to find an indicator that can represent the overall development of the system, and this indicator is GDP. Similarly, if we want to reflect the degree of educational development, we can use the average years of national education to express it; If we want to reflect social order, we can use the incidence of criminal cases to express it; Reflecting the national health level can be expressed by the number of hospital registrations. In any case, it is always necessary to find an indicator to describe the overall development of the system.
If nothing else, just looking at the shape of the curve, I think the primary industry has the least impact on GDP. GDP has been going up, and the curve shape of the primary industry is almost flat. Looking at the similarity alone, it seems that the secondary industry, that is, the gray curve, is most similar to the GDP curve. However, drawing an image is only to give an intuitive feeling and analysis, and the approximate degree of the curve shape still needs to be calculated.
The second step is to determine the analysis order. Analysis sequence can be divided into two categories, one is called mother sequence, which is a data sequence that reflects the overall behavior characteristics or development of the system. It can be understood as the dependent variable in regression analysis, which is the column of GDP. The other kind is called subsequence, that is, the data sequence composed of factors that affect the development of the system, which can be understood as independent variables in regression analysis. Here are the GDP data of primary industry, secondary industry and tertiary industry respectively.
The third step is data preprocessing. We talked a lot about preprocessing, such as normalization, standardization, normalization and so on. The purpose of preprocessing here is to remove the influence of dimensions, narrow the data range and facilitate calculation. This is often the role of data standardization. There are many methods of data standardization, such as standardization, that is, the original data MINUS the mean divided by the variance, which is often used for random variables; Another example is standardization, that is. Both methods have been mentioned before.
So here, the standardization method we use is that each element is divided by the average value of the corresponding index, that is. Ok, let's show the processed data. Just use excel to process it, which is more convenient.
The fourth step is to calculate the correlation between each element in the processed subsequence and the corresponding element in the parent sequence. Remember, the parent sequence is, and the child sequence is,,. We first calculate the minimum difference between mother and child sequence, and then calculate the maximum difference between mother and child sequence. The calculation is as follows.
Well, you can find that the smallest element in the above table is the largest element in the above table. Then we can calculate the correlation between each element in the child sequence and the corresponding element in the parent sequence.
In grey relational analysis, the definition, where is the resolution coefficient, usually lies between and is often taken. As for why such a formula should be used to define the degree of correlation between an element of a subsequence and the corresponding element of a parent sequence? I don't know ... well, look it up yourself. If you know, please leave a message and tell me. Thank you!
The fifth step is to calculate the correlation between the sequences, that is, the correlation between the index and the whole system. We define and use it to express the correlation between an indicator and the overall development of the system.
Well, in fact, it is the fourth step, to obtain the degree of correlation between each element in the indicator and the corresponding element in the parent sequence. By averaging them, it can be regarded as the degree of correlation between indicators and the whole system. If you can accept the above formula for calculating the correlation degree, it should not be too difficult to accept the average value of the correlation degree.
The above picture is the final calculation result of this problem. The calculation shows that when the resolution coefficient is 0.5, the tertiary industry has the greatest influence on GDP. It doesn't seem to be consistent with that photo ... After all, intuitively, it should be that the curve shape of the secondary industry is closest to the curve shape of GDP, and the result is the tertiary industry. Ok, let's try another one.
After some operations, the tertiary industry has the greatest impact on GDP. But again, it is the most commonly used in practical use.
If we want to forcibly explain a wave, it is probably the fluctuation of GDP growth rate. In 2002-2005, the slopes of various broken lines were different, while in 2002-2005, the secondary industry basically operated in a straight line. In contrast, the growth and change of the tertiary industry is more like the change of GDP ... well, it is forced to explain.
The picture above shows the annual increment ... well, it seems that gray and blue are similar, but the increment from 2003 to 2005, that is, the growth of tertiary industry and GDP in the four years from 2002 to 2005 is similar. The secondary industry is only similar for one or two years, so on the whole, it may be that the tertiary industry has a greater impact on GDP.
Okay, forced explanation is over.
Finally, there are two questions about system analysis.
Okay, so much for system analysis.
The core of grey relational analysis in comprehensive evaluation is to determine the weight of each index through the correlation degree of each index, and then sum the weights and score.
Or these twenty rivers. What should the grey relational analysis do to evaluate water quality?
The first step is to forward all indicators. Forward processing, you know what it is, is to convert all the smallest, middle and interval indicators into the largest indicators. In other words, the larger the data value, the higher the final score.
The second step is to standardize the forward matrix. The standardization here is the same as the standardization of the above system analysis. That is, divide each element by the average value of the corresponding index to narrow the data range and eliminate the dimensional influence. The matrix processed by the above two steps is recorded as
Step 3, take a maximum value from each row of the normalized pre-processed matrix as the mother sequence. Well, this is the point that we should pay attention to when the grey relational analysis is used in comprehensive evaluation, that is, artificially constructing such a mother sequence.
The fourth step, according to the above method, calculate the grey correlation degree between each index and the parent sequence, and record it as.
The fifth step is to calculate the weight of each index. The weight of each indicator. That is, the proportion of correlation to the total correlation.
Step six, we get the score of each evaluation object. For the evaluation object, its score. Here, that is, the forward standardized matrix mentioned above. Each index in is a maximum index, and the higher the value, the higher the score, eliminating the influence of dimension. Therefore, we directly regard the elements in in as the scores of each evaluation object under each index, and then sum the scores of each index by weight. The weight is the weight we get with the above gray correlation degree. In this way, we got the final score.
Seventh, normalize the score. In this way, you can put all the scores between 0- 1. The advantage of normalization is that the score at this time can be interpreted as the percentage of the corresponding research object in the whole research object, that is, the position. On the topic of water quality, that is, the position of a river's water quality in all rivers. Well, to put it more popularly, it is similar to "your grades have exceeded xx% of your classmates". This is the purpose of normalization.
The following figure shows the water quality evaluation and the results of TOPSIS method and grey relational analysis.
As you can see, these two methods are different in the final ranking of this issue. The method of taking the first place is different, and the order of the middle part is different, but it is similar in general. Hhh, it is better to use another analytic hierarchy process to average the normalized scores obtained by the three methods as the basis for the final ranking. Well, look at this model. Is it complicated at once?
Well, that's all for this article. In fact, there are still some puzzling problems that have not been solved.
The latter two seem to be forcibly explained, because we regard the normalized and standardized matrix as a score matrix, so we take the maximum value of each row to construct the optimal score sequence of the system, and each scheme is equivalent to a development of the system. After calculating the correlation degree, it is to see the influence degree of indicators on the optimal sequence of the system. The greater the influence, the greater the weight we give it ... well, force an explanation.
For the above three questions, if anyone has a better idea, I hope you can leave a message and tell me. Now I am here to thank you! If I understand it later, I will update it in the article. (However, the official account on WeChat may not be updated, but both Zhihu and both can. )
Grey relational analysis, that's all I can share. If you want to continue to understand, you can read Grey System Theory and Its Application, and Liu Sifeng is waiting. Well, grey system also has applications such as grey system prediction, grey combination model, grey decision-making, grey clustering evaluation and so on. Just have a look.
These two days, Zhihu pushed me some questions and answers related to mathematical modeling, including a book related to mathematical modeling. I searched the electronic version of the book recommended by Gao Zan. If necessary, you can reply to the "Mathematical Modeling Book" in the background of WeChat official account "I am Chen".
exceed