Probability theory of data science II. Calculation probability

Once you start to deal with probability, you will soon realize that the assumption that all possible outcomes are equal is not always reasonable. For example, if you think a coin is biased, then you won't think that both sides of it have the same probability.

In order to deal with some situations, some results are more likely than others, and a more general theory is needed. In 1930s, Russian mathematician Andrei Andrey Kolmogorov (1903- 1987) put forward some basic rules, called axioms, which covered many situations and became the basis of modern probability theory.

Axiom begins with the result space ω. Now let's assume that ω is finite. Probability is a function p defined on an event, which, as you know, is a subset of ω. The first two axioms only set the scale of measurement: they define probability as a number between 0 and 1.

The third and final axiom is that probability is the key to the "scale" of events. We will study it after we have formulated some relevant provisions.

The third axiom is about mutually exclusive events. Unofficially, if there is only one event at most, then the two events A and B are mutually exclusive; In other words, they cannot happen at the same time.

For example, suppose you randomly select a student from a class, 40% of whom are freshmen and 20% are sophomores. Each student can be a freshman, a sophomore, or nothing; But no student is both a freshman and a sophomore. So if A is "the selected student is a freshman" and B is the event "the selected student is a sophomore", then A and B are mutually exclusive.

What's the big deal about mutually exclusive events? To understand this, first consider whether the selected student is a freshman or a sophomore. In the language of set theory, this is the combination of "freshman" and "sophomore". It's a good idea to use venn diagram to display events. In the figure below, suppose that A and B are two mutually exclusive events, which are shown as blue and gold circles respectively. Because the events are mutually exclusive, the corresponding circles do not overlap. The union is the set of all points in two circles.

What are the chances of students being freshmen or sophomores? Generally speaking, 40% are freshmen and 20% are sophomores, so the natural answer is 60%. This is the percentage of students who meet our "Grade One or Grade Two" criteria. Simple addition is effective because the two groups do not intersect.

Kolmogorov used this idea to form the third and most important axiom of probability. Officially, if the intersection is empty, then A and B are mutually exclusive events:

In the context of finite result space, axioms show that:

You will show in practice that this axiom contains something more general:

This seemingly simple axiom has great power, especially when it is extended to countless mutually exclusive events. First, it can be used to create some convenient computing tools.

Suppose that 50% students in a class take data science as one of their majors, and 40% students major in data science and computer science (CS). If you randomly choose a student, what is the probability that this student is majoring in data science, not CS?

The Venn diagram below shows the dark blue circle corresponding to event A (data science as one of the majors) and the golden circle corresponding to event B (data science and CS major) (not drawn to scale). These two events are nested, because B is a subset of A: everyone in B takes data science as one of their majors.

therefore

In ...

What is the probability of this student in the light blue difference? If you answered "50%-40% = 10%", you are right. Your intuition tells you that probability is like an area. They are. In fact, this calculation is based on the axiom of additivity, and we see that these areas are also inspired by them.

Suppose a and b are events,

This is an disjoint set. According to the axiom of addition:

So,

If the probability of an event happening is 40%, what is the probability of it not happening? 60% of the "obvious" answer is a special case of the law of subtraction.

For any event b,

Prove that the Venn diagram below shows what to do. Take a = ω in the subtraction formula and remember the second axiom.

When you see a negative sign in probability calculation, as in the complementary set rule above, you will often find that the negative sign is due to the rearrangement of items in the application of additional rules.

When you increase or decrease the probability, you implicitly decompose the event into disjoint parts. This is called sub-event, which is a basic and important technology to master. In the following chapters, you will see many uses of division.

Let's see if we can use the results of our development to calculate some probabilities. Some steps can be understood without calculation; Other things need more work.

Example1:heads and tails in n throws.

Toss a coin n times to make all

What are the chances of getting at least one positive and at least one negative?

Answer. Each face appears at least once in many sequences. For example, if n = 4, such sequences include HTTT, HTHT, TTHT and so on.

Method supplement: when an event may occur in many different ways, it may be a good idea to see the way it will not happen, because there are fewer cases.

For n = 4, the only sequences that do not appear at least once on each face are HHHH and TTTT. In fact, for any n, there are only two sequences, from which we can't get both sides: both are heads and both are tails. These are two sequences with the same elements.

Let A be the event "We get at least one positive and at least one negative". This question needs P(A). because

According to the complement rule:

Note that as n becomes larger, the answer tends to 1. With a lot of throwing, you can almost certainly see the head and tail.

Roll the dice 12 times, so all

Question 1. What is the probability that the maximum value is less than 5?

Answer 1. The key is to observe that the event "the maximum value is less than 5" is the same as the event "all 12 faces are less than 5". To achieve this, each of the points 12 must have one of the four values 1 to 4. So:

Yes, we can simplify it further, but we are not going to do so, because we will soon understand the reason.

Question 2. What is the probability that the maximum value is less than 4?

Answer 2. There is nothing new here except that the 5 in the 1 question is replaced by 4.

Question 3. What is the probability that the maximum value is equal to 4?

Answer 3: It is not easy to write down all the series whose maximum value is equal to 4. Let's see if we can make use of what we already know. The maximum value is equal to 4:

The maximum value must be less than 5.

And cannot be less than 4.

We regard the set {4} as a difference: {1, 2,3,4}-{1,2,3}.

So through the subtraction rule,

12 vote is nothing special. The whole process can be replaced by n 12, and the parameters will be as described above.

The maximum value is an example of the extreme value, and the other is the minimum value.

Problem solving skills: When you use extreme values, please remember the observation we used in this example: to say that the maximum value is small is equivalent to saying that all elements are small. Similarly, to say that the minimum value is large is equivalent to saying that all elements are large.

A random number generator generates two numbers, so all 100 logarithms are equally possible.

What is the possibility that the second digit is greater than the first digit?

Answer, method 1- division: make an organized list of all the ways in which events occur. A good way to list the second number greater than the first number is to divide them according to the value of the first number:

This division is convenient for calculation. Among 100 possible even pairs, there are 9+8+7+6+5+4+3+2+1= (9×10)/2 = 45 pairs. So the answer is 0.45.

Answer, method 2-symmetry: convince yourself with some symmetry: the probability that the second number is greater than the first number is the same as the probability that the first number is greater than the second number. One method is to divide the second event according to the value of the second number, and pay attention to the corresponding relationship with the division in the first method.

, the law of addition shows that:

Because there are 10 pairs of equal numbers: 00, 1 1, 22, ... 99. Now solve p:

Just like before.

It's a good idea to learn these two methods. Division and symmetry will run through the whole course.

The main axiom of probability is about mutually exclusive events. As it turns out, we don't need any other axioms to deal with intersection events.

Let a and b be two events. The intersection A ∩ B indicates that both A and B are events, which are displayed in bright blue in the venn diagram on the right.

Because there will always be intersections, we will be a little lazy when expressing them: we will use AB to express intersections instead of writing the intersection symbol ∩. You must remember that AB is an event, not a product.

Here is an example to help explain some of the definitions we will elaborate.

Suppose I have a small deck of cards, which consists of a red card, a green card and a blue card. Suppose I shuffle, draw one, shuffle the remaining two, and then draw one. This is called randomly drawing two cards without replacing them.

A reasonable result space is ω = {rg, Rb, GB, GR, BR, BG}, in which six elements are equally possible.

The probability that we get the green card first and then the red card is the probability of a single sequence GR:

Simple calculations contain more interesting things. note:

What is the second factor 1/2? To understand this, first look at the even pairs with G. Among them, only one person's next card is R. The second factor of the product is:

This score is called the conditional probability that R is the second under the condition that G is the first.

It is denoted as P (second card R∣first card G). This is a vertical bar, not a diagonal bar.

Now our original calculation of this card can be written as one card at a time:

Calculations like the above have inspired a new definition. Let a and b be two events. Then the conditional probability of B under the condition of A is defined as:

Division rules:

There are some abuses of symbols here. B|A is not an event. But symbols are convenient. The whole left side should be understood as "the probability of B happening when A happens".

By definition: A is given, so limit your attention to the result of A, which is your whole space now, so all the probabilities must be calculated relative to P(A). What are the chances of B happening now? The answer is P(AB)/P(A).

If we divide by P(A), you will be more careful. You may wonder what will happen if P(A) = 0. So, in this case, we won't give A, because A won't happen. So we don't have to worry about this.

Multiplication rule:

This is just a rearrangement of the definition of conditional probability, but it may be the most commonly used rule among all the rules.

Let a and b be two events. Then the probability that they all happen is:

Note that the answer is "a small part of a small part". The probability that both A and B occur is less than A-the more conditions in an event, the smaller the probability of occurrence.

Because of AB B, you know that P(AB) is less than P(B). You also need to check:

We will end this part with some simple examples. The next section contains some examples that need further study.

The standard deck consists of 52 cards, 4 of which are aces. Two cards were randomly distributed and not put back.

Question 1. Suppose the first card is A and the second card is A, what are the chances?

Answer 1. 3/5 1, because now your deck has 5 1 cards, three of which are aces.

Question 2. What are the chances that both cards are aces?

Answer 2: Through the multiplication rule and the answer 1, the answer is:

Question 3. If you put the cards back, how can you change the answers to questions 1 and 2?

Answer 3 (Who brought it back to deal cards? Only in probability class ...) Put the card back before you draw the second card. Under this assumption, you draw cards from the same deck every time, so:

Whatever the first card is, the answer is the same. Meanwhile:

Note that changing the nature of randomness will not change whether you multiply the probability. You are still looking for the probability of intersection, so you have to do multiplication. A change in assumptions will only change the way you multiply.

According to the census estimate you see in data 8, the population of the United States in 20 14 years is 318,857,056. * * * 9037 99-year-old males and 3279 99-year-old females1person.

Suppose you randomly select a person from the American population in 20 14 years, and this person is 99 years old. According to this information, what are the chances that this person is a woman?

Answer. The answer is naturally the percentage of 99-year-old women:

This is consistent with the definition of conditional probability, that is, you should calculate:

Do not need the entire population of the United States; It can be erased. This is an important observation of the environment. When you randomly sample and you know that your choice is in a specific subgroup, the numbers in that subgroup are very important.

Considering the age of 99, this person is almost four times more likely to be a woman than a man. But as you can see in data 8, there are more men than women among our youngest residents-newborns.

All you need is an addition rule and a multiplication rule. Here are some examples of standard problem solving techniques.

A box contains 6 dark chocolates and 4 milk chocolates. I picked two at random and didn't put them back.

Question: What are my chances of getting each one?

Answer. You will notice that this question does not say whether the first one is black or milk. Both can happen. Therefore, please list the different ways of events, that is, event division:

The first one is black and then milk: according to the law of multiplication, the probability is (6/ 10) (4/9).

The first is milk and then black: the probability is (4/ 10) (6/9).

(ah! These two items are the same! Prepare for more such symmetries in non-replacement sampling. )

Now add up the two probabilities. The answer is 2 (6/ 10) (4/9).

This method should be as natural as breathing. You should redo the problem under the unnatural assumption that chocolate is brought back to the sample to see what has changed and what has remained the same.

A box contains B black balls and W white balls. Randomly draw a ball, then put it back and put in d balls of the same color. Then randomly draw a ball from the jar.

Question 1. What are the chances that the first ball drawn is black?

Answer 1. It doesn't take much effort.

Question 2: What are the chances that the second ball is a black ball?

Answer 2. You will naturally think, what is the first ball? Then divide it according to the color of that ball and add it up. The basic method is working again.

This is the same as the probability that the first ball is a black ball, no matter what D is. This rule is very interesting!

Question 3: Given that the first ball is black, what is the probability that the second ball is black?

Answer 3. We used it in the above calculation. The conditional probability of "keeping pace with the times" can usually be read from the information in the question, such as:

$ P(\ text {second black ball} \ mid \ text {first black ball}) =

\frac{b+d}{b+w+d}$

Question 4: Given that the second ball is black, what is the probability that the first ball is black?

Answer 4, the conditional probability of this "time reversal" is not easy to read. This is the law of division.

This really depends on D, but it's the same as answer 3. There seems to be no difference between the front and the back.

Now you begin to understand why this law is named after the famous founder George Polya (1887- 1985). You can continue to repeat this rule-change the drawn ball into a D ball of another color, and then draw it-to obtain a beautiful and useful attribute, so as to update your view when the data comes in. We will see it later in this course.

Data changed the mind. We may start with a series of assumptions about how the world works, but as we collect more data, we may need to update our views according to what we see in the data.

Views can be reflected by probability, and these views can also be updated as information enters. In this section, we will establish a probabilistic updating method with given data. We will start with an example, and then we will state this method more broadly.

There is a rare disease in the population: only 0.4% people have it. There is a test for this disease, which is used for people with this disease and has a 99% chance of returning positive results. For people without diseases, it has a 99.5% chance of returning negative results. On the whole, this is a good test.

Choose a person at random from the crowd. Suppose this person's test result is positive, what is the probability of this person getting sick?

The following is a tree diagram we drew in Data8 to summarize the information in the question.

To solve this problem, we will use the law of division. Let D be the event that the patient has a disease, and let+be the event that the patient's test result is positive when some mathematical symbols are abused. Then what we are looking for is P(D |+). According to the division rules,

$ P(D \ mid+)= \ frac { P(D \ text { and }+)} { P(+)}

= \ frac { 0.004 \ cdot 0.99 } { 0.004 \ cdot 0.99+0.996 \ cdot 0.005 }

= 44.3%$

Generally speaking, if the whole result space can be divided into events.

Knowledge points of measuring mathematical quantity in Xiaoshengchu

20 16 national second mathematics answer

Sad sentences describing love

Knowledge accumulation of mathematical prime numbers in Xiaoshengchu

Name all the advantages of Jackson Yi.

Who did the exo members collide with?

1950, what position did mathematician Hua give up to return to the motherland?

What about young Hegel?

Thinking the game Daquan of Fun Children [1 1]

Is the college entrance examination difficult in Jiangsu this year?