Current location - Training Enrollment Network - Mathematics courses - What is a normal distribution?
What is a normal distribution?
Directory 1 normal distribution directory 1 normal distribution put away, and edit the normal distribution in this section.

Probability distribution. Normal distribution is the distribution of continuous random variables with two parameters μ and σ2. The first parameter μ is the mean of a random variable that obeys normal distribution, and the second parameter σ2 is the variance of this random variable, so the normal distribution is recorded as N(μ, σ2). The probability law of random variables subject to normal distribution is that the probability of taking values close to μ is large, and the probability of taking values far from μ is small; The smaller σ is, the more concentrated the distribution is around μ, and the larger σ is, the more dispersed the distribution is. The density function of normal distribution is characterized by: with respect to μ symmetry, it reaches the maximum at μ, takes a value of 0 at positive (negative) infinity, and has an inflection point at μ σ. Its shape is high in the middle and low on both sides, and the image is a bell curve above the X axis. When μ = 0 and σ 2 = 1, it is called standard normal distribution, and it is recorded as n (0, 1). When a μ-dimensional random vector has similar probability laws, it is said that this random vector follows a multidimensional normal distribution. Multivariate normal distribution has good properties, such as the edge distribution of multivariate normal distribution is still normal distribution, and the random vector obtained by arbitrary linear transformation is still multidimensional normal distribution, especially its linear combination is unary normal distribution.

The normal distribution was first obtained by A. de moivre in the asymptotic formula of binomial distribution. C.F. Gauss deduced it from another angle when studying the measurement error. Laplace and Gauss studied its properties.

The probability distribution of many random variables in production and scientific experiments can be approximately described by normal distribution. For example, in the case of unchanged production conditions, the strength, compressive strength, caliber, length and other indicators of the product; Body length, weight and other indicators of the same organism; Weight of the same seed; Measuring the error of the same object; Deviation of the impact point in a certain direction; Annual precipitation in a certain area; And the velocity component of ideal gas molecules, and so on. Generally speaking, if a quantity is the result of many tiny independent random factors, then it can be considered to have a normal distribution (see the central limit theorem). Theoretically, normal distribution has many good properties, and many probability distributions can be approximated by it. There are also some commonly used probability distributions directly derived from it, such as lognormal distribution, T distribution, F distribution and so on.

Normal distribution is the most widely used continuous probability distribution, which is characterized by a bell-shaped curve.

normal distribution

1. Normal distribution

If the known density function (frequency curve) is a normal function (curve), it is said that the known curve obeys the normal distribution and is labeled as ~. Where μ and σ2 are two uncertain constants, which are parameters of normal distribution, and different ones correspond to different normal distributions.

The normal curve is bell-shaped, with low ends, high middle and symmetrical left and right. The area between the curve and the horizontal axis is always equal to 1.

2. The characteristics of normal distribution

The frequency distribution of variables subject to normal distribution is completely determined by sum.

(1) is the position parameter of normal distribution, which describes the centralized trend position of normal distribution. Normal distribution takes the axis of symmetry as the axis, and the left and right sides are completely symmetrical. The mean, median and mode of normal distribution are the same, which are all equal to.

(2) Describe the dispersion degree of normal distribution data. The larger the data distribution, the more dispersed the data distribution, the smaller the data distribution and the more concentrated the data distribution. Also known as the shape parameter of normal distribution, the larger the curve, the flatter it is, on the contrary, the smaller the curve, the thinner it is.

standard normal distribution

1. The standard normal distribution is a special normal distribution. μ and σ2 of the standard normal distribution are 0 and 1 respectively. Usually, (or z) is used to represent variables that obey the standard normal distribution, and it is denoted as z ~ n (0, 1).

2. Standardized transformation: This transformation has characteristics: if the original distribution obeys normal distribution, then z = (x-μ)/σ ~ n (0, 1) obeys standard normal distribution, and the probability value of the original normal distribution can be directly calculated by looking up the standard normal distribution table. Therefore, this transformation is called standardized transformation.

3. Standard normal distribution table

The standard normal distribution table lists the area proportion in the range of -∞-x (current value) under the standard normal curve.

Area distribution under normal curve

1. In practical work, the area of an interval on the horizontal axis under the normal curve reflects the percentage of the number of cases in this interval to the total number of cases, or the probability (probability distribution) that the variable value falls within this interval. The area under normal curves in different ranges can be calculated by formulas.

2. Several important area ratios

The area between the axis and the normal curve is always equal to 1. Under the normal curve, the area in the horizontal axis interval (μ-σ, μ+σ) is 68.27%, the area in the horizontal axis interval (μ- 1.96σ, μ+ 1.96σ) is 95.00%, and the area in the horizontal axis interval (μ-2.58σ, μ.

Application of normal distribution

Some medical phenomena, such as the height of homogeneous population, the number of red blood cells, the amount of hemoglobin, the random error in experiments, etc., show normal or near normal distribution; Although some indicators (variables) obey skewed distribution, the new variables after data conversion can obey normal or close to normal distribution and can be treated according to the normal distribution law. Among them, the index that obeys normal distribution after logarithmic transformation is called lognormal distribution.

1. Estimation of frequency distribution As long as the mean and standard deviation of a variable subject to normal distribution are known, the frequency proportion in any range can be estimated according to the formula.

2. Set the reference value range

The (1) normal distribution method is suitable for indicators that obey normal (or close to normal) distribution and indicators that can obey normal distribution after conversion.

(2) Percentile method is often used as an indicator of skewed distribution. The single and double boundary values of the two methods in Table 3- 1 should be mastered skillfully.

3. Quality control: In order to control the measurement (or experiment) error in the experiment, it is often used as the upper and lower warning value and the upper and lower control value. The basis of this is that under normal circumstances, the measurement (or experimental) error obeys the normal distribution.

4. Normal distribution is the theoretical basis of many statistical methods. Many statistical methods, such as test, variance analysis, correlation and regression analysis, require that the analyzed indicators obey normal distribution. Although many statistical methods do not require the analysis indicators to obey normal distribution, the corresponding statistics in large samples are approximately normal distribution, so these statistical inference methods are also based on normal distribution.

Research process

The concept and characteristics of normal distribution I the concept of normal distribution

According to the histogram drawn by the frequency table data with general distribution, it can be seen from the figure (1) that the peak value is located in the middle, and the left and right are roughly symmetrical. We assume that if the number of observation cases is gradually increased and the group is subdivided, the connecting line at the top of the histogram will gradually form a smooth curve (3), with the peak at the center (where the mean value is located), and the two sides will gradually decrease and be symmetrical, and will not intersect with the horizontal axis. This curve is called frequency curve or frequency curve, which is similar to normal distribution in mathematics. Since the sum of frequencies is 100% or 1, the area on the horizontal axis under the curve is 100% or 1.

For the convenience of application, the normally distributed variable X is often transformed.

This transformation transforms the original normal distribution into the standard normal distribution, also known as the U distribution. U is called standard normal variable or standard normal deviation.

Second, the characteristics of normal distribution:

1. The normal curve is the highest at the average value above the horizontal axis.

2. The normal distribution is symmetrical with the average value as the center.

3. Normal distribution has two parameters, namely mean μ and standard deviation σ. μ is the position parameter. When σ is fixed, the larger μ is, the more the curve moves to the right along the horizontal axis. On the contrary, the smaller μ, the more the curve moves to the left along the horizontal axis. σ is the shape parameter. When μ is fixed, the larger σ is, the wider the curve is. The smaller σ is, the steeper the curve is. N~(μ, σ2) is usually used to represent a normal distribution with μ mean and σ2 variance. N (0, 1) represents the standard normal distribution.

4. The distribution of area under normal curve has certain regularity.

In practical work, it is often necessary to know the percentage of the area of an interval on the horizontal axis under the normal curve to estimate the percentage (frequency distribution) of the number of cases in this interval to the total number of cases or the probability that the observed value falls within this interval. The area of an interval under a normal curve can be obtained from the table 1 For data with normal or near normal distribution, if the mean and standard deviation are known, the frequency distribution can be roughly estimated.

Attention should be paid when looking up the attached table 1: ① The area under the curve in the table is the cumulative area from-∞ to the left of U; (2) When μ, σ and X are known, calculate the value of U according to the formula u=(X-μ)/σ, and then look up the table. When μ and σ are unknown and the sample content n is large enough, μ and σ can be replaced by sample mean X 1 and standard deviation s respectively, and the u value can be obtained according to the formula u=(X-X 1)/S, and then the table can be looked up; (3) The areas of the intervals symmetric to 0 under the curve are equal, such as the areas of the intervals (-∞,-1.96) and (1.96, ∞), and the total area on the horizontal axis under the curve is 100% or 1.

Fig. 2 Area distribution of normal curve and standard normal curve

Section 2 Application of Normal Distribution Some medical phenomena, such as the height of homogeneous population, the number of red blood cells, the amount of hemoglobin, cholesterol, etc. And the random error in the experiment, showing normal or nearly normal distribution; Although some data are skewed, they can become normal or close to normal distribution after data conversion, so they can be treated according to the normal distribution law.

1. Estimate the frequency distribution of normally distributed data.

Example 1. 10 The height (cm) of male college students aged 0/8 was sampled in 1993, with the mean value = 172.70cm and the standard deviation s=4.0 1cm. ① The estimated place is 65438. ② Find out the actual percentage of 65,438+08 male college students in the range of X+-65,438+08, X+-65,438+0.96s and X+-2.58s respectively, and compare with the theoretical percentage.

In this case, μ and σ are unknown, but the sample content n is large. According to the formula (3. 1), replace μ and σ with sample mean x and standard deviation s respectively, and get the value of u, that is, (168-172.70)/4.01=-1. Looking up the area under the standard normal curve in the attached table, it is found that the left side of the table is-1. 1, and the top of the table is 0.07. The intersection of them is 0.1210 =12.10%. The height of male college students aged 18 in this area is below 168cm, accounting for about 12. 10% of the total. Other calculation results are shown in Table 3.

Table 3 Actual and theoretical height distribution of male college students100 years old.

Be distributed

x+-s

Height range (cm)

Actual distribution

number of people

Actual distribution

Percentage (%)

Theoretical distribution (%)

X+- 1s

168.69~ 176.7 1

6767.0068.27

x+- 1.96s 164.84 ~ 180.56

9595.0095.00

x+-2.58 s 162.35 ~ 183.05

9999.0099.00

2. Formulate the medical reference value range: also called the medical normal value range. It refers to the fluctuation range of anatomical, physiological and biochemical indexes of so-called "normal people". When setting the normal range, we must first determine a group of "normal people" whose sample content is large enough. The so-called "normal people" does not refer to "healthy people", but refers to homogeneous people who exclude diseases and related factors that affect the indicators studied; Secondly, appropriate percentile values should be selected according to the research purpose and use requirements, such as 80%, 90%, 95%, 99%, and 95% is commonly used; Unilateral or bilateral boundary values should be determined according to the actual use of indicators. If the white blood cell count is too high or too low, it is judged that the bilateral boundary value is abnormal. For example, if the liver function transaminase is too high, it is abnormal to determine the unilateral upper limit value, and the vital capacity is too low, it is abnormal to determine the unilateral lower limit value. In addition, the appropriate calculation method should be selected according to the distribution characteristics of data. Commonly used methods are:

(1) normal distribution method: It is suitable for data with normal or near normal distribution.

Bilateral boundary value: x+-u (u) s, unilateral upper boundary: x+u (u) s, or unilateral lower boundary: x-u (u) s.

(2) Lognormal distribution method: suitable for lognormal distribution data.

Bilateral boundary value: LG-1[x (lgx)+-u (u) s (lgx)]; Unilateral upper bound: LG-1[x (lgx)+u) s (lgx)], or unilateral lower bound: lg- 1[X(lgx)-u(u)S(lgx)].

According to needs, commonly used u values can be found in Table 4.

(3) Percentile method: it is often used for skewed distribution data and data with no exact value at one or both ends.

Bilateral boundary values: P2.5 and p 97.5; unilateral upper limit: P95; or unilateral lower limit: P5.

Table 4 table of commonly used u values

Reference range (%) Unilateral and Bilateral 800.842

1.282

90 1.282

1.64595 1.645 1.960992.3262.576

3. Normal distribution is the theoretical basis of many statistical methods: for example, T distribution, F distribution and x2 distribution are all derived from normal distribution, and U test is also based on normal distribution. In addition, the limits of T distribution, binomial distribution and Poisson distribution are all normal distributions, which can be treated according to the principle of normal distribution under certain conditions.