Current location - Training Enrollment Network - Mathematics courses - How to express complex game theory relations with mathematics?
How to express complex game theory relations with mathematics?
Mathematical model of game theory

Author: Zhu Kezhen College 0 1 mixed class

Wang Dafang, He Pei, Zou Ming

abstract

Game theory has been widely used now, and people's decision-making problems can be explained by game theory model. This paper first uses mathematical methods to express the game behavior in real life, and deduces the general results of the game, and then discusses the influence of some different external constraints on the game process. We use monopoly competition in economics as an example of game problem, discuss the decision-making of producers in different States, and then analyze the motives and possibilities of both sides.

(A) the establishment of the basic game model

First, the performance of game behavior

The standard formula of the game includes:

1. 1. Participants in the game.

2.2. Policy sets that each participant can choose.

3.3. According to the combination of strategies that all the participants may choose, the income obtained by each participant is in the n-player game.

A selectable strategy space with si as participant I, where any specific pure strategy is si, and any specific pure strategy is Si, si∈Si,

The multivariate function ui(s 1, s2, ... Sn), when the decision of n players is s 1, s2, ... Sn represents the payoff function of the ith player.

Second, the solution of the game

When the game enters a stable state, the strategy chosen by the participants must be the best response to the established strategies of other participants. In this state, no one wants to deviate from the status quo alone. This situation is called Nash equilibrium:

In the standard game with n players, G={ S 1, S2, ............................................................................................................................................... ; In u 1, u2, ... UN}, if the strategy combination {s 1*, s2*, ... Sn *} meets the requirements of each participant {

The optimal coping strategy of s 1*, s2*, ... Si- 1 *, Si+ 1 *...Sn *}, and the target strategy combination {s 1*, s2*, ... Sn *} are Nash equilibria of this game. Namely: ui {

s 1*、s2*、……si- 1*、si*、si+ 1*……sn*}≥ui {

S 1*, s2*, ... si- 1 *, si, Si+ 1 *...Sn *} is applicable to all si∈Si.

Nash proved in 1950 that there is Nash equilibrium in any game with finite players and finite pure strategies that each player can choose. Mixed strategy (including mixed strategy) refers to adopting a strategy in a strategy space with a certain probability distribution, which is not discussed in this paper.

Generally speaking, Nash's proof ensures that our equilibrium analysis is meaningful.

Third, the game example: Cournot competition in single-stage game

In Cournot competition, a few manufacturers control prices by changing output to maximize their income.

We make the following assumptions:

1. 1. The goods produced by manufacturers are all the same, and consumers have no preference for a certain manufacturer.

2.2. The function of price and supply in the market is p=a-bQ. The increase of supply will not lead to excess, but will only reduce the price.

Low, that is, manufacturers can sell all the products they produce.

3.3. Manufacturers are rational, that is, they make decisions to maximize their own interests in the face of a given situation.

4.4. The information is complete, and each manufacturer knows that other manufacturers are rational, and each manufacturer knows that others are rational.

This fact is known to all the participants.

(B) The solution and discussion of the game model

For the sake of simplicity, let's start with the situation of an enterprise:

When there is only one enterprise, the target revenue function u=Q(a-bQ).

The solution of maximum u is Q0=a/2b and u0=a2/4b.

When there are two enterprises, let the output be Q 1 and Q2 respectively, then

p=a-b(Q 1+Q2)

u 1(Q 1,Q2)= p * Q 1 = Q[a-b(Q 1+Q2)]

u2(Q 1,Q2)=p*Q2=Q[a-b(Q 1+Q2)]

Nash equilibrium points Q 1*, Q2* are equations.

u 1/? Q 1 =0 ( 1)

u? Q

The solution of 2/2=0 (2).

Arrange, get

2bQ 1+bQ2=a (3)

bQ 1+2bQ2=a (4)

The solution is Q 1*=Q2*=a/3b, and the corresponding u 1=u2=a2/9b.

Nash equilibrium point is an extreme point. Once it reaches this point, neither side has the motivation to change first.

Let's discuss the isolation of Nash equilibrium point, that is, when the other party's initial decision is not in Nash equilibrium, whether the two sides can change the game situation to Nash equilibrium point through rational benefit maximization strategy.

The formula (1) represents the optimal function of the manufacturer 1. Given the output q of the other party, we can maximize our own profits according to (1), which is determined by the following formula.

Equation (3), when the optimal function of the manufacturer is Q 1=(a-bQ2)/2b, represents the optimal function of the manufacturer (2). Equation (4) shows that the optimal function of manufacturer (2) is Q2 = (a-bq2+0)/2b.

These are two straight lines, as shown in the figure, and the intersection e is the Nash equilibrium point.

AB is the optimal function of vendor 1, and CD is the optimal function of vendor 2.

When the initial selection point of both parties is A, that is, Q 1=0, Q2=a/b, and A is on the optimal function of vendor 1, then vendor 1 will not change, but the most point of vendor 2 for Q 1=0 is C, then the decision point of both parties will shift to C, and vendor/.

Under the assumption of complete information, the above-mentioned series of adjustment processes can be predicted before any party makes a decision, and any decision rejected by any manufacturer different from point E is not the best choice under given conditions, so both parties will make output decisions through point E in unison. But when?

When Q 1=Q2= 1/2 * a/2b (5), both parties can get the maximum benefit.

Q 1=Q2= 1/2 * a2/4b (6)

On the one hand, it shows that Nash equilibrium point is not the best decision-making point, on the other hand, it also shows that the competition between two manufacturers improves the social effect than monopoly, and the total social output increases from a/2b to 2/3 *

a/b=2a/3b .

When the number of manufacturers increases to n, the model becomes

n p=a-b*∑i= 1Qi (7)

ui=p*Qi,i= 1,2,…n (8)

i/ i =0 I= 1,2……n (9)

It can be proved by induction that (9) it can be reduced to equations (expressed in matrix form)? u? Q

2 1

1:

1? 1....2 1: 1 1.... 1 1? 2.... 1:::? .... 12

1? Q 1 1? Q2? 1::? :? :Qn? = a/b *? 1? ( 1)

According to the analysis of linear algebra, the equations have a unique non-zero solution.

q 1 * = Q2 * =…Qn * = a/(n+ 1)b,

ui*=a2/(n+ 1)2b

The total social output is na/(n+1) b.

This shows that there must be Nash equilibrium point in monopoly competition of H manufacturer, and the same method can prove that Nash equilibrium point is not isolated, so all rational parties will make output decisions according to the equilibrium point.

In addition, the greater the n, the more thorough the competition and the higher the total social output. When n is large, the total output tends to a/b, when the price p is 0, when the price p is 0, this model is not applicable. Because when n is small (generally less than 5), monopoly manufacturers have the ability to control prices through their own output.

The overall best choice of the manufacturer is Q 1 * = Q2 * =...qn * = = A/2nb,

Earnings can be obtained separately, a2/4nb. Obviously, the greater the n, the greater the gap between the result of the rational game of the manufacturer and its optimal choice point.

(c) Multi-stage games and intrigues

As can be seen from the above, it is necessary for the manufacturer as a player to seek to limit the output, but the best choice point is unstable, and the party who breaks the contract first can get extra profits, so some conditions are needed to restrain the behavior of both parties. In addition, * * * only seeks benefits in the long run. Both parties need to constantly check whether they have breached the contract and decide whether to breach it. Each such process is the above single-stage game.

The information condition here is the game result of the n- 1 stage that every enterprise can observe in the n stage. The rule is that once the other party breaks the contract, he will break it and never keep it, which is known to both parties.

We introduce a new time discount factor V, 0.

a2( 1+v+ v2+……)/8b = a2/[8( 1-v)b]( 10)

For the first defaulting party, according to the output of a2/4b in (3) and (4), the optimal output is 3a/8b, and the income at this stage is

[a-b(3/8+ 1/4)a/b]* 3/8 * a/b = 9 a2/64b( 1 1)

From then on, both sides understood that * * * strives for breakthrough, and all of them are produced according to the a/3b balance. If one party defaults in the N stage, the income 2 is A (1+V+V2+... VN-1)/8B+9VN/64 * A2/B+VN+1* A2/[(1-V) AB].

( 12)

( 12)-( 10),[VN/64-VN+ 1/72( 1-V)]* A2/b。

The solution is v.

(D) * * * In-depth planning and monitoring issues

In the long-term game, people need more complicated mechanisms to maintain a non-Nash equilibrium and maximize benefits. Different from the previous model, in every single-stage game, people's judgment on opponents is not only the previous result, but a long-term experience. This involves a question of credibility, that is, the probability of authenticating uncertain factors. This model enables us to make the most favorable decision according to the different strategies of our opponents. The result of cooperation generally appears in the stage far from the end of the game, and in the last few stages of the game, players often only pay attention to the current interests.

The strategy we put forward to maintain our reputation is to "reciprocate a peach", that is, the next decision will be the same as the opponent's last decision.

Modify the above monopoly competition model as follows:

1. 1. Rational player B knows that player A has a p probability to choose the strategy of reciprocated, and has a (1-P) probability to choose.

Choose other strategies (at this point A becomes a rational person). A also knows that B is rational.

2.2. At each stage n, both parties make decisions at the same time, and they all know the results of the previous N- 1 time. Once a doesn't do it

Using the principle of "returning a peach to a plum" to make a rational decision to maximize benefits, then B regards A as rationality, which has become the knowledge of both AB. From then on, the game degenerates into the general complete information rational game discussed above, and the solution obtained is Nash equilibrium point.

Single-stage game

For the single-stage game, from the discussion in the above formula (5), cooperation means that the manufacturer produces a/4b output, otherwise the manufacturer produces according to the principle of profit maximization. The first defaulting manufacturer produces 3a/8b, making a profit of 9a2/64b, and then all manufacturers produce according to a/3b, making a profit of a2/9b. (For convenience of description, constant coefficient a2/b is omitted here, the same below). The strategic return matrix of both parties is

A \ B cooperate or not?

Cooperation (1/8, 1/8) (5/48, 5/36)

Not cooperating (5/36, 5/48) (1/9, 1/9)

Two-stage game

In the two-stage game, rational B will choose not to cooperate in the second stage. At the beginning of the first stage, he wants to speculate on the situation of A, and the probability that A has P is equal. Therefore, if B chooses to cooperate in the first stage, then the expected income of B in the first stage is

P * 1/8+( 1-P)* 5/48( 12)

B The expected return of the second phase is P * 5/36+(1-p) *1/9 (13).

(Because if A doesn't reciprocate, B will know this fact at the end of the first stage, and both sides will choose Nash equilibrium point in the second round. )

If B chooses not to cooperate in the first stage, then B will produce a/3b (the non-cooperation here does not produce 3a/8b, because B doesn't know whether A is a rational player at this time, and empirically we find that the output decision of a/3b has higher expected income than that of 3a/8b).

So the expected return of B in the first stage is 5p/36+(1-p)/9; ( 14)

B The expected income of the second phase is1/9; (15) (No matter whether A is rational or not, both parties will not cooperate in this matter).

When P≥52%, discuss the formula (12)+(13)-[(14)+(15)] ≥ 0.

So in the two-stage game, as long as it is estimated that A has a 52% chance to reciprocate, B will choose to cooperate.

Considering the information assumption in the model, A fully understands the idea above B, so A has at least the motivation to dress up as "reciprocated".

Three-stage game

Now it is expanded to a three-stage situation. As long as B cooperates in the first stage, the latter two stages will degenerate into a two-stage game. According to the above analysis, B's expected income for the three stages is

u 1= P/8+5/48( 1-P)

u2=P/8+( 1-P)/9

u3=5P/36+( 1-P)/9

Total expected return u1+U2+u3 = 47/144+p/16 (16).

If B does not cooperate in the first stage, then whether A reciprocates or not, it will not cooperate in the second stage. And rational b will definitely not cooperate in the third stage.

If B continues to choose not to cooperate in the second stage at this time, then the expected income of B from this deviation in each stage is U1= 5p/36+(1-p)/9U2 =1/9U3 =1/9.

Total expected return u1+U2+u3 =1/3+p/36 (17).

Comparing (16) and (17), it is concluded that when P≥20%, the formula is (17) >: formula (16), and B has no deviation motivation in the first stage.

If B does not cooperate in the first stage, the second stage and the third stage, then his expected income in each stage is

u 1 = 5P/36+( 1-P)/9 U2 = 5/48 u3 = 5P/36+( 1-P)/9

The total expected return is P/ 18+47/ 144, which is always less than (16). At this time, B has no power deviation in the first stage.

To sum up, as long as A has a 20% chance to reciprocate, B has no motivation to deviate from cooperation in the first two stages.

For A, once he deviates from cooperation in the first stage, then from the second stage, A is rational and becomes the knowledge of both sides of the game. At this time, his expected return is 5/36+1/9+1/9 =13/36.

And if A keeps cooperating, its equilibrium income is1/8+1/8+1/9 =13/36.

Therefore, it is not important whether A deviates from cooperation in the three stages, but it is just a coincidence because of the particularity of the data in this issue.

multistage expansion

As can be seen from the development of the above three stages, with the increase of the number of stages, each player will consider the long-term benefits more than the immediate situation. This means that a small credible probability p is needed to suppress the betrayal of the other party.

When * * * has a T-stage game, we can prove by induction that rational parties choose to cooperate in the stage of 1 to T-2, and act in accordance with the two rounds of games discussed above in the stage of T- 1 and T. Suppose any t (t

If a is at t

+ (T-t)/9

The equilibrium income of A is from 1 to T-2, 5/36 of each stage 1/8, T- 1, and the last stage 1/9. Obviously, the income from early default is less than the equilibrium income.

For B, it can be seen from the two-stage game that B has no motivation to cooperate in the previous T-2 stage and in the T- 1 stage, and B may only deviate from cooperation in the t≤T-3 stage. Once b deviates from cooperation in stage t,

Then reciprocal and rational A will not cooperate at t+ 1 stage,

So in the previous t+ 1 stage, B can't confirm whether A is rational, and the game between the two sides is equivalent to the T-(t+ 1) stage from t+2 stage to A.

By induction, it is assumed that the two sides will cooperate to the T-2 stage in the second half of this game, and then proceed according to the above two-stage game. B's total income is

u = 1/8 *(T- 1)+5/36+5/48+[T-2-(T+2)+ 1]* 1/8+[P/8+( 1-P)* 5/48+5P/36+

(1-P)/9] This is less than B's equilibrium income (t-2)/8+[p/8+5 (1-p)/48+5p/48+(1-p)/

So b has no motive and only deviates once.

More generally, in the first game (T-3), B deviated and cooperated many times. If induction is used many times according to the above method, you can find that the expected income is less. The fundamental reason is that the first person who breaks the contract can't judge the real type of the other party, so he can't guarantee that his own interests can be maximized. Once the agreement breaks down, the cost of repair is very high, which makes the extra benefits of breach of contract less than the cooperation between the two parties.

(5/36+5/48)& lt; 2* 1/8) This mode makes the plot more binding.

Summarize and further study

This paper mainly establishes a mathematical model of static game problem, and uses it to analyze an example: Cournot competition and collusion in monopoly market. In a static game, the mathematical maximum is the equilibrium solution of the game. Rational decision-making forces people's behavior to move to the point of maximum benefit, and information is the most important prerequisite for rational decision-making. It can be said that different information conditions will lead to different rational decisions. This paper discusses the most perfect information hypothesis: complete information. It means that both sides know each other's situation, but also that the other side knows their own situation, and so on, and finally form an infinite recursive chain. The reciprocity model discussed in the end is not completely informative, but it also has a set of evaluation criteria known to both parties to constrain their decisions. In a word, the model discussed in this paper is a game in which both parties know the rules, which is an idealized simplification of the actual game. Under such simplification, how to properly handle the recursive chain of infinite information is a problem to be further studied. As far as monopoly is concerned, the biggest idealization of this model lies in the linear functional relationship between price and supply, and this functional relationship can further fit the reality, from which different income functions and multiple Nash equilibrium points can be derived for further analysis.

refer to

Robert gibbons. Fundamentals of game theory, a primer on game theory.

Joseph stiglitz: Economics.

Fang Cheng, Zhang Tao, et al. Research on repeated game simulation based on cumulative expectation difference evaluation strategy. System engineering. 2002,20(3).-87-9 1.

Huo's duopoly economic fishing strategy 《 Practice and Understanding of Mathematics 》 2002,32 (2). -20 1-205

Game analysis of oligopoly market by Xue Weixian, Feng Zongxian and Chen Aijuan. System Engineering Theory and Practice, Vol.22, No.65438, 2002 +0 1.