A Simple Markov Chain

. In this paper, Markov chain is used to model the reproduction of the fixed finite population, and use binomial distribution to discuss the probability of gen inheritance between generations population genes and establish transfer matrix ,using conditional mathematics expect to obtain the probability of absorption at 0 and at 2N only depends on the initial fraction of A gene, and this also must be a fixation probability of gene A , and this conclusion is the same as the results of MATLAB simulation.


Introduction
The population size is assumed to be fixed throughout every step in our model. Our population shall contain 2N individuals either having a type-a genetic characteristic or a type-A characteristic. One individual of the parent generation is randomly chosen and the successor is from the same genetic type. In a population, some genes are passed on to the next generation, while others are not. If there is no selective mutation, no selective pressure, what is the effect on the inheritance of genes in the population, whether there will be a deletion of a gene, whether there will be gene fixation. This has important implications for the evolution and extinction of species, hence, We want to solve the following questions: Question 1: What is the probability that a randomly chosen individual of the parents' generation is of type-a or of type-A respectively? Question 2: What's the duplication process? Question 3: we want to know whether a certain type of gene will disappear, and if so, how likely is that to happen?
The general idea of Markov process is that the future is only about the present, not the past.，In our model, The gene frequency at next generation only depends on present generation, not depend on the previous generations. So we use Markov chains to solve the above three problems.

Assumptions
Our model follows the following rules: (1) The population size is assumed to be fixed throughout every generation in our model, our population shall contain 2N individuals.
(2) Each individual either has a type-a gene or a type-A gene, there are no other types of genes; (3) Generations do not overlap and die simultaneously; (4) Whether or not each ancestor has descendants is completely random every time; (5) There is no mutation [1].

Model analysis
In our model. Let A and a denote two genetic characteristics in a given population. Our model is a discrete, non-overlapping and no mutation. Discrete means that the offspring appear together at the same time, but not continuously; non-overlapping means that the individuals of the t-generation will not be left over to the t 1-generation. It can be understood as: the parents in the population give birth to offspring at the same time.
When the offspring appear, all the parents disappear, ensuring that the generations do not interfere with each other; no mutation means that all offspring of A individuals continue to contain only type-A.This is an ideal state, and the reality is more complicated than this.
As an example, there is a small population of 4 individuals. This is unrealistic, but we want to illustrate the duplication process of the model through this simple example.

Duplication process
since we assume that the population is a constant, there will be 4 individuals in the next generation again. In the first generation, the number of type-A in our model can be 0,1,2,3, 4.but if there are 4 type-A gene in ancestor, there will be no type-a gene in the next generation. Similarly, if there are 0 type-A gene in ancestor, there will be no type-A gene in the next generation. The number of type-A and the corresponding probabilities are presented in the below table1 and figure 1( n represent the number of type-A , p represent the corresponding probabilities)  To illustrate how the duplication process is, we represent each individual with a small circle. The blue circles represent the individuals having a type-A gene, and the yellow circles represent the individuals having a typea gene (see Figure 2), Without loss of generality, we assume that there are 3 individuals having type-A, Each individual from the offspring generation picks a parent at random from the previous generation, so the number of individuals with type-A in the next generation may be 0,1,2,3, and 4. and parent and child are linked by a line. Each offspring inherits the genetic type of the parent. One of the possible results for the second generation is shown in figure 2. After a few generations, it will look something like Figure 3. There are tangled and untangled version. This diagram shows the same process, except that those individuals have been shuffled a little to avoid the confusion of many lines crossing. The genealogical relationship is still the same, except that the children of one parent are now placed together, close to the parents [3].

Distribution
This is a small population of size 2N 4 . we will focus on how the frequency of genes will change from generation to generation, Now, if in generation 1, There are three individuals with gene-A, the frequency of A gene is 0.75, and what is the probability of having 0, 1 A's in the second generation?
In a given population with 4 . One individual of the parent generation is randomly chosen and the offspring is of the same genetic type, each individual has the same chances to be the ancestor of the next generation, So the second generation is determined by 4 independent binomial trials. If the number of gene is 0 4 and the number of gene is 4 , then the probability that the offspring has gene is /4 , the probability that the offspring has gene is 1 ,So in the second generation, denote the probability of the number of gene is .
Let us calculate the probabilities from formula of the binomial distribution ( 2 . Table 2 shows the genetic situation of each generation if 2 Table 2. Probabilities in second generation(edited by the author)

The state space
We will fix our attention at frequency of gene A in the population of 2N . The general idea of Markov process is that the future is only about the present, not the past. The discrete time process is an example of Markov chain [4], The gene frequency at generation t 1 only depends on generation t, not depend on the states before t.So the process of inheritance between generations can be regarded as a Markov chain.
The state X of the chain corresponds to the number of genes of type A. In any generation, X takes one of the values 0, 1, ⋯ ,2N, which constitutes a state space. Let X be the number of the gene A at generation t.The state space of this Markov chain is the set of possible numbers of the type A, obviously X ∈ 0,1, … ,2N .

transition matrix
In our model, one individual of the parent generation is randomly chosen and the offspring is of the same genetic type, each individual has the same chances to be the ancestor of the next generation. Thus, the next generation is determined by 2N independent Bernoulli trials. So X is a binomial random variable. If the initial generation consist of i genes of type A and 2N i genes of type a,the probability of resulting in gene A is p i/2N and the probability of resulting in gene a is q 1 i/2N for each Bernoulli trial, as We generate a Markov Chain X , where X is the number of A genes in the t 'th generation, since population size is a constant 2N. X is a binomial random variable with index 2N and the probability of resulting in gene A is X /2N . The transition probabilities from X i to X 1 j for this Markov Chain are computed according to the binomial distribution. The probability is given by the binomial formula (in which 2N is the population size and p the frequency of gene A and therefore, the probability that an individual picks a parent with gene A). So, we can write the probability transition matrix for the Markov chain as P P , since X ∈ 0,1, … ,2N ,then the transition probability matrix has a size of 2N 1 2N 1 ,If X i in generation t, then the probability of X j in the next generation with gene A is given by the binomial formula.  Figure 4 shows what the second generation might look like if p=0.75.We can multiply the transition matrix to predict the distribution of gene frequencies at any number of generations in the future using the updated frequency.

simulate evolution of population
When simulating allele frequencies, we don't need to pick a random parent for each individual one by one. We can just pick a random number from the binomial distribution (with the appropriate 2N and p ) and use this as the frequency of the allele in the next generation A simple MATLAB function to simulation function zq() Run it in MATLAB when p 0.5, p 0.75, p 0.25, The code is shown in Figure 5,we can get the follow figures: Fig. 6 The model simulation(edited by the author) As shown in Figure 6, we can conclude that the probability of losing the gene is much higher if p is smaller.

the absorption state
The absorption state of a Markov chain is a state that once entered can never be exited. Since there are no mutations in the model, eventually one of the two genes will be lost (and the other fixed). In other words, States 0 and 2N are completely absorbing. no matter what the value of X is, eventually X will take the value 0 or 2N. And once this happens, X will stay in that state forever. In the case of X 0, the population will consist only of a genes, while in the case of X 2N,the population will be purely Agene population.
The population can attain fixation and be composed of only A-genes or a-genes. Either of the absorbing states (either 0 or 2N) will be entered. Therefore lim → P X i 0 0 i 2N .And we will discuss absorption probabilities in two cases [6] .
Assume the number of A -gene is i in the first generation, then it can be obtained by using the mathematical expectation formula of binomial distribution.

2
(1) absorption at 0 Using the expectation of a conditional expectation On the other hand,

conclusion
According to the above analysis, Under the assumption that the total quantity is fixed, the probability of absorption at 0 and at 2N only depends on the initial fraction of A gene which equal i/2N, and this also must be a fixation probability of gene A. If the proportion of gene A at the first generation, the probability of absorption at 2N is greater. This is the same as the conclusion of MATLAB simulation.