2022-06- 14 - Training Enrollment Network

2022-06- 14

Master 12 minimalist statistical analysis method in 15 minutes-hit one, two, three, four, five and you will be able to beat the tiger.

John Cheng

2022-6- 12

After my new book "Mathematical Analysis of Total Inventory Management" went on the market, many readers gave feedback and couldn't understand it. This incident has always depressed me, but I also understand that in fact, many people, including me, may have failed to study well in the three courses of "Advanced Mathematics" in universities (Advanced Mathematics, Linear Algebra, Probability and Mathematical Statistics, and compulsory courses for general engineering students, including "Advanced Mathematics" in my book), so they may just be finished after the exam. Not at all.

However, in the supply chain management, especially the core supply chain planning management, we must process the data, constantly analyze the data, dig and refine the valuable signals that may exist behind it, and if necessary, we may need to do some simple statistical forecasting with Excel, R language and other tools, which inevitably requires the use of some so-called "advanced mathematics" knowledge. Simple as it is, you must master it.

If Mathematical Analysis of Total Inventory Management gives you a headache, I suggest you be patient and spend more than ten minutes reading this article in combination with Excel. If you can get some interest, you can continue to read my book and learn more practical data analysis tools and methodologies. But if you are not interested in it, or even can't understand it, then don't read it.

When we were young, we all knew a children's song called "12345, Going Up the Mountain to Shoot the Tiger". Here we take the number y=( 1, 2,3,4,5) as an example. I believe that if you can really understand this 12345, you will really dare to go up the mountain and shoot tigers (play supply chain data analysis)!

Let's assume that this y=( 1, 2, 3, 4, 5) is the shipment of a company in the past1~ May (that is, x =( 1, 2, 3, 4, 5)), where X represents the time and period, and Y represents the shipment during the period.

I have listed 12 small problems here, representing 12 mathematical formulas and data analysis methodology:

1. Average shipment volume in the past 5 months.

This question is simple. It is estimated that 99.999% people can do it. The answer is (1+2+3+4+5)/5=3, and the corresponding Excel function formula is average ().

But don't underestimate this simple arithmetic mean calculation, which is the first step for you to have a "number" of customer needs, and this is what I have repeatedly emphasized in the first chapter of this book-the average is very important.

According to my personal experience in managing people and things in the field of supply chain management, if you can casually say the weekly average demand of a product, SKU or model you are responsible for without looking at a computer or a mobile phone, you will basically have a "number" in your heart.

You can immediately use this "average" to measure the people around you, and the result may disappoint you!

2. Truncation (0.2) average (pruning average)

How many people know this so-called truncated average? I don't know much about it, because if I hadn't studied R language, I didn't know there was such a thing before, so I believe many people don't know it (I'm doing "maximum likelihood estimation (MLE)", the fourth chapter of mathematical analysis of total inventory management), although many people have heard of its application, which is similar to "removing a highest score"

This definition is also very useful in supply chain management. For example, when evaluating the historical needs of customers, we may artificially delete the maximum and minimum values that seem abnormal. As for the proportion, you decide for yourself. Excel formula is trimmean (), where trim itself means pruning, and mean actually means average, just like average. As for why we don't use Trim Mean () to do this expression in Excel, we will.

Our example y=( 1, 2, 3, 4, 5), the amount of data is very small, and the data itself is very neat. Whether trim=0.2 or 0. 1, 0.3, 0.4, the result is 3, but if you change that 5 into 6 and try these four ratios again, their results will be different. If you are interested, you can simulate more data to see what this formula means.

3. Median (median)

As the name implies, the so-called median is the value in the middle of the sequence. In our example, a * * * has five numbers, and 3 is the median, because there are two brothers in front and two sisters behind. It's the third child!

The median value in the Excel formula is median ().

As far as I know, this median also means to remain "neutral". No matter how nonsense the brothers and sisters talk, it will never change. Unlike arithmetic average, it belongs to the nature of swing. When brothers and sisters quarrel, it also quarrels with them, so it has a nickname "being averaged". For example, if you change this 12345 to 12346, the median value is still 3, but the average value has changed from 3 to 3.2. Therefore, many times, we would rather believe the median, because this average is not very reliable, and it is said that people have died! I also heard that a big statistician drowned in a river with an average depth of less than 1 m.

Ideally, the median and the average are the same or similar in size, so the data structure distribution is generally better, and may even be the legendary normal distribution.

4. quartile deviation (IQR)

When analyzing a set of data and a time series, we can usually divide the data into four segments on average, so that each segment accounts for 25% of the total data. It is estimated that this "four points" means this. After "four points", it is naturally "five points", that is, four sections of five points. These five points can be named Q 0, Q 1 and Q 2 respectively. Q 0 is the smallest (min), Q 4 is the largest (max), and the others are 25%, 50% and 75% respectively, so Q 2 is the third child, but strangely, here Q 2 refers to the average, not the median. I think the statisticians may have made a mistake: other brothers and sisters are talking about "location". Why does this Q 2 have to be the average?

The quartile deviation (IQR) is IQR = Q3–q1. In our example of 12345, IQR=Q 3-Q 1 = 4-2 = 2.

In Excel, there is a graph called boxplot, and its principle is this IQR. It's just that the middle of this box is the median, not Q 2 (average), as I understand it. Q 2 should be a median! Unfortunately, the two sides of this box are not completely equal to Q 1 and Q 4. There are two boundary lines above and below the box, which should be the minimum and maximum outliers mentioned below, but they are actually Q 0 and Q 4, which I don't understand. In this case, what is the significance of this box diagram? My ideal boxplot should be like this: the middle of the box is Q 2 or median, the upper and lower covers are Q 1, and Q 3 represents the concentration range of data, or the "distribution" range of data; The two antennas protruding from the upper and lower covers should be the maximum and minimum abnormal values respectively, so as to define the "normal range", and the abnormal values outside the antennas. I think so because from the perspective of supply chain management, the maximum value (max) and the minimum value (min) are irrelevant, and they cannot represent normal or abnormal. We are more concerned about the so-called abnormality, because there is so-called "abnormal demand" in supply chain management.

Let's talk about this problem.

5. Outliers (minimum and maximum outliers)

The so-called abnormal is abnormal, and normal and abnormal are relative. Since it is relative, there must be a relative scale, that is, the maximum abnormal value and the minimum abnormal value. Those within the range are called "normal values" and those outside the range are abnormal values.

This range is defined as Q2 1.5 iqr.

For our example y =( 1, 2,3,4,5), the maximum outlier is Q2+1.5 iqr = 3+1.5× 2 = 6, and the minimum outlier is Q2-1.5 iqr = 3. But if you change that 5 to 6, 6 is definitely "abnormal", and interested readers can try that formula with Excel.

It needs to be clear that this so-called outlier is not the same concept as the "truncation" part of the truncated mean mentioned in the second question-the truncated value is not necessarily an "outlier", and the outlier should be the truncated object. I haven't done strict mathematical proof of this conclusion myself, but you might as well get a few more strange numbers and then use Excel to simulate and verify it.

6 * * * *. Difference

As the name implies, variance is "the square of difference", and statistical variance refers to the average value of the square of difference after subtracting its average value from each number in a group.

Back to our example y=( 1, 2, 3, 4, 5), the average value is 3, and the difference after y-3 is: (-2,-1, 0, 1, 2), and the square of the difference is (4, 1. Its average value is 10/5=2 or 10/(5- 1)=2.5, which can be expressed by Excel formula. One is var.p () and the other is var.s (), where p and s respectively represent the population and the sample (. In practical application, both can be used, because they reflect the relative dispersion of data, and different arrays can be calculated and compared with the same formula.

7. Standard deviation

The standard deviation is obtained by directly opening the square root. For our example, the standard deviation we can get is root (2)= 1.4 14 or root (2.5)= 1.58 1, or we can calculate the standard deviation with Excel formulas stdev.p () and stdev.s ().

This so-called standard deviation is actually the legendary Sigma, and Sigma is the standard deviation. If you think the demand distribution conforms to the normal distribution, the average value of 1σ can cover about 68% of the data distribution, the average value of 2σ can cover about 95% of the data distribution, and the average value of 3σ can cover about 99% of the data distribution. If you don't think the demand distribution is normal, it is random, but it doesn't matter. Even so, the average value of 2 σ can cover about 87% of the data distribution, and the average value of 3 σ can cover about 95% of the data distribution. This is a conclusion based on Markov inequality (the content of Chapter 3 of Mathematical Analysis of Total Inventory Management).

8. Demand fluctuation

The standard deviation divided by the arithmetic mean is the so-called volatility, which is statistically called CV (coefficient of variation). This resume is very important for us to analyze customer or market demand, and it is one of the important indicators of demand classification.

For our 12345, the demand volatility is1.414/3 = 0.471or 1.58 1/3=0.527.

Similarly, this demand fluctuation is relative. Different products or the same product come from different customers and different distribution centers, and their demand fluctuations may be different. In my book "Total Inventory Management of Manufacturing Industry", this volatility is used for XYZ classification.

Volatility is also a very intuitive indicator to measure the effect of demand aggregation-the volatility of demand after merger is less than the sum of the volatility of individual demand before merger.

9. First-order difference (difference)

The so-called first-order difference is the internal subtraction of a group of numbers. The eldest one subtracts the second one, and the second one subtracts the third one, which can be reduced to the last. Very simply, the first-order difference result obtained by our example is (1, 1, 1, 1).

What does the first-order difference mean?

Or our example, y=( 1, 2,3,4,5) is a diagonal line, and the sequence after the difference (1, 1) is a horizontal straight line. From the perspective of supply chain management, do you prefer whether the customer's demand is diagonal or approximately horizontal?

From the perspective of statistical forecasting, it goes without saying that horizontal straight lines are better for forecasting, because the demand is relatively stable. It is only important to note that statistically speaking, "stability" may not be consistent with what you think in your mind. This stable English word is static, which means that the moving average of demand is relatively stable. Although demand may fluctuate in a certain period of time, there is no obvious upward or downward trend. This is "stable demand", which is different from stability. Stability means stability, that is, the demand in each period is roughly the same.

10. Demand trend function

For time axis x=( 1, 2, 3, 4, 5), what is the functional expression of shipment y=( 1, 2, 3, 4, 5)?

We can get the result in two ways. One is to use Excel as a line chart, add a trend line, select "linear" and display the formula, and we get y = x；; Another method is to use index(linest ()) in Ecxcel to find the value of A in y=ax+b as 1 and b=0.

Function and equation thinking is an important part of the treasure house of mathematical thinking, and supply chain management also needs some function and equation thinking (for details of this part, including the latter two knowledge points, please refer to Chapter 6 of Mathematical Analysis of Total Inventory Management).

1 1. The first derivative of the demand function

The demand function is y=x and its first derivative is y'= 1. If you look at the ninth question, you will find that this first derivative is equal to the first differential value, both of which are 1. Why?

The reason is very simple, not strictly understood, infinite difference is differential, and derivative can be obtained after differential. Both the first-order difference and the first-order derivative represent the growth rate of demand.

12. Forecast the demand value of the 6th phase.

With the previous work, the demand forecast for the sixth period, that is, June, is very simple. We can use three methods to complete this statistical prediction:

1. Difference prediction method: According to the ninth question, the difference sequence we get is (1, 1, 1), which is a horizontal straight line. Then, naturally, we will "extrapolate the trend" and use the difference equation Y 6-Y 5 = Y 6-

The second method is to use the trend function: y=x, and when x=6, y=x=6.

The third method is to use the first derivative: we know that the first derivative of this function represents the growth rate of demand, and this value is 1, so naturally, we use y 5+1= y 6 = 5+1= 6.

Of course, the simple and convenient statistical prediction method is the average method, that is, y 6 = 3 or simply take the value of the previous period as the prediction of the next period, that is, y 6 = y 5 = 5.

In addition, we can also consider using the moving average method, including the "natural weighted average method" mentioned in an article to forecast the demand in June.

However, no matter which prediction result, or that sentence, you believe in statistical prediction and you don't believe it.

Write it at the end

I believe that most readers can understand this 12 knowledge point, but it may take a process from understanding, to application, to skilled application, and even to perfection. You can't do it without some effort. So, don't underestimate this 12345. It can also be used to shoot tigers when it is done! You can even think that this is a minimalist version of the mathematical analysis of total inventory management.

In addition, I made this 12 knowledge point into a quiz star. The link is as follows. If you have time, you can invite your colleagues and friends, especially the bosses, to do it and test your IQ.

TIM subscription number: ITOOTD

How to pronounce ε?

The 10th 12 1 means 1 (), and the 2nd () means 2 ().

Equal sign german

How is Song's math in Dalian?

What is the special meaning of the number 14?

What are the three basic abilities of mathematics?

What are the requirements for the postgraduate entrance examination in Sichuan University?

20 14 Xuzhou Sanmao Mathematics

How old are Chinese characters in China?

What did you learn when you were a graduate student in mathematics?