catalogue
abstract
formula
Definition of variance
Calculation of variance
Several important properties of variance
Expectation and variance of common random variables
Application of statistics
chebyshev inequality
launch
Edit this paragraph overview
As the following example:
Knowing that the real length of the part is a, measure it 10 times with two instruments, A and B, and the measurement result X is represented by a point on the coordinate, as shown in the figure:
Instrument measurement results:
B instrument measurement results:
The average value of the measurement results of the two instruments is A. However, using the above results to evaluate the advantages and disadvantages of the two instruments, it is obvious that we will think that the performance of instrument B is better, because the measurement results of instrument B are concentrated near the average value.
It can be seen that it is necessary to study the deviation degree of random variables from their mean. So, what quantity is used to measure this deviation? It is easy to see that E(|X-E(X)|) can measure the deviation of a random variable from its mean value E(X). However, the above formula is usually used because of its absolute value and inconvenient operation.
The numerical characteristic of e {[x-e (x)] 2} is variance.
Edit this paragraph formula
Variance is the expected value of the square of the difference between the actual value and the expected value, and standard deviation is the square root of variance. In actual calculation, we use the following formula to calculate the variance.
The variance is the average of the square of the difference between each data and the average, that is, S2 = (1/n) [(x1-x _) 2+(x2-x _) 2+...+(xn-x _) 2], where x _ represents the sample.
However, when (1/n) [(x1-x _) 2+(x2-x _) 2+... +(xn-x _) 2] is used as an estimate of the variance of the population x, it is found that its mathematical expectation is not the variance of x, but (n-x). [1/(n-1)] [The mathematical expectation of (x1-x _) 2+(x2-x _) 2+... +(xn-x _) 2] is the variance of X, which has the following characteristics.
Variance, popularly speaking, is the degree of deviation from the center! Used to measure the fluctuation of a batch of data (that is, the deviation of this batch of data from the average). In the case of the same sample size, the greater the variance, the greater the data fluctuation and the more unstable it is.
Edit the definition of variance in this paragraph.
Let x be a random variable, and if e {[x-e (x)] 2} exists, then e {[x-e (x)] 2} is called the variance of x, and is recorded as D(X) or DX.
That is, d (x) = e {[x-e (x)] 2} is called variance, and σ (x) = d (x) 0.5 (the same dimension as x) is called standard deviation (or mean square deviation). That is, statistics is used to measure the dispersion of a set of data.
Variance describes the dispersion degree between the value of random variable and its mathematical expectation.
If the value of x is concentrated, the variance D(X) is small;
If the value of x is dispersed, the variance D(X) is larger.
Therefore, D(X) is a quantity to describe the dispersion degree of X value and a scale to measure the dispersion degree of X value.
Edit the variance calculation in this paragraph.
By definition, variance is a function of random variable x.
G (x) = ∑ [x-e (x)] 2 Wharf
Mathematical expectation. Namely:
From the definition of variance, the following commonly used calculation formulas can be obtained:
D(X)=∑xi? pi-E(x)?
D(X)=∑(xi? pi+E(X)? pi-2xipiE(X))
=∑xi? pi+∑E(X)? π-2E(X)∑xipi
=∑xi? pi+E(X)? -2E(X)?
=∑xi? pi-E(x)?
Variance is actually the square of standard deviation.
Several important properties of editing variance in this paragraph
(1) Let c be a constant, then D(c)=0.
(2) If X is a random variable and C is a constant, then D (CX) = (C 2) D (X).
(3) Let x and y be two random variables, then
D(X+Y)= D(X)+D(Y)+2E {[X-E(X)][Y-E(Y)]}
Especially when x and y are two independent random variables, and the third term on the right in the above formula is 0 (covariance),
Then D(X+Y)=D(X)+D(Y). This property can be extended to the case of finite sum of several independent random variables.
(4) The necessary and sufficient condition for d (x) = 0 is that x takes the constant value c with the probability of 1, that is, P{X=c}= 1, where e (x) = c.
Edit the expectation and variance of common random variables in this paragraph.
Let the random variable x.
X obeys the distribution of (0- 1), then E(X)=p D(X)=p( 1-p).
X obeys Poisson distribution, that is, X~ π(λ), then E(X)= λ and D(X)= λ.
X is uniformly distributed, that is, X~U(a, b), then e (x) = (a+b)/2 and d (x) = (b-a) 2/ 12.
X obeys exponential distribution, that is, x ~ e (λ), e (x) = λ (- 1) and d (x) = λ (-2).
X obeys binomial distribution, that is, X~B(n, p), then E(x)=np, D(X)=np( 1-p).
X obeys normal distribution, that is, x ~ n (μ, σ 2), then e (x) = μ and d (x) = σ 2.
X obeys the standard normal distribution, that is, X~N(0, 1), then E (x) = 0 and D (x) = 1.
Edit the application of statistics in this paragraph.
concept
The average value of the sum of squares of the difference between the data in the sample and the average value of the sample is called sample variance; The arithmetic square root of sample variance is called sample standard deviation.
Sample variance and sample standard deviation are both measures of sample fluctuation. The greater the sample variance or standard deviation, the greater the fluctuation of sample data.
Variance and standard deviation. Variance and standard deviation are the most important and commonly used indicators to measure discrete trends. Variance is the square of variance of each variable value and the average of its mean, which is the most important method to measure the dispersion degree of numerical data. The standard deviation is the square root of variance, expressed by S, and the corresponding standard deviation calculation formula is
The difference between standard deviation and variance is that the calculation unit of standard deviation and variable is the same, which is clearer than variance, so we often use standard deviation more in analysis.
Examples of college entrance examination
(Gansu Province, 2002) A computer Chinese character input speed competition was held in Class A and Class B of Grade Three in a school. Students in two classes input the number of Chinese characters per minute. After statistics and calculation, the results are shown in the following table:
Variance of the average number of words in a class.
a 55 135 149 19 1
b55 135 15 1 10
A classmate draws the following conclusions according to the above table:
① The average level of students in Class A and Class B is the same;
② The number of excellent students in Class B is more than that in Class A (every minute 150 Chinese characters or more is considered excellent);
③ The fluctuation of students' grades in Class A is greater than that in Class B, and the correct conclusion is _ _ _ _ _ (fill in serial number).
Solution: Fill in ①, ② and ③. Obviously ① and ③ are correct. For the second conclusion, because the median of A is 149, it means that the number of outstanding students in Class A is less than half, while the median of B is 15 1, which means that the number of outstanding students in Class B is more than half, so the number of outstanding students in Class B is more than that in Class A..
Edit this Chebyshev inequality.
chebyshev inequality
For any random variable x, if both e x and DX exist, for any ε > 0,
There is always P {| X-EX |>;; =ε} & lt; = dx/ε 2 or p {| x-ex | =ε}
Smaller p {| x-ex |
At the same time, when EX and DX are known, Chebyshev inequality gives an upper bound of probability P{|X-EX| > = ε}, which has nothing to do with the specific probability distribution of random variable X, but only with its variances DX and ε. Therefore, Chebyshev inequality is widely used in theory and practice. It should be pointed out that although Chebyshev inequality is widely used, the upper bound of probability given by Chebyshev inequality is usually conservative in specific problems.
Chebyshev inequality means that in any data set, the proportion of data whose average value exceeds k times standard deviation is at most1/k 2.
In probability theory, Chebyshev inequality shows that "almost all" values of random variables will be "close" to the average. This inequality is described by quantification. What is "almost all" and what is "close":
The value that differs from the average by 2 standard deviations shall not exceed 1/4.
The value that differs from the average by 3 standard deviations shall not exceed 1/9.
The value that differs from the average by 4 standard deviations shall not exceed116.
……
The value that differs from the average by k standard deviations shall not exceed1/k 2.
For example, there are 36 students in a class, and the average score of an exam is 80, and the standard deviation 10. We can conclude that there are no more than 4 students with a score below 50 (more than 3 standard deviations) (=36* 1/9).
Open classification:
"One" which version is the junior high school textbook of Xi 'an Middle School in Shaanxi Province?
People's Education Edition, Beijing Normal Un