Study on COVD-19 epidemic prediction and distribution Strategy based on SIR ModeI

. Developing a more scientific COVID-19 response strategy has become a hot topic today. By sorting out a large amount of data, this paper predicts the spread law of the epidemic in Shanghai, Beijing, and Changchun in May, summarizing the key factors related to the spread of the epidemic and human immunity, using a tool to process and fitting the data, obtaining the spread law of the COVID-19 outbreak in the three cities under the epidemic, and establishing the SIR Infectious disease model. The development of the epidemic situation in the above towns was analyzed and predicted by linear fitting. In than in Shanghai, Beijing, and Changchun, three cities the size of the population, social and economic situation, policy enforcement, and other factors, we select the optimal strategy of Shanghai medical collections as the research object to analyze the degree of Shanghai regional deficiencies, and supplies reasonable dispatch by using principal component analysis and cluster analysis method.


Introduction
In 2019, there was a sudden outbreak of pneumonia, a highly contagious disease that spread around the world, referring to pneumonia caused by the 2019 novel coronavirus infection.Since December 2019, several cases of pneumonia of unknown cause with exposure to the Huanan seafood market were found in some hospitals in Wuhan, Hubei Province, which are confirmed to be acute respiratory infectious diseases caused by the 2019 novel coronavirus.On February 11, 2020, Tedros Adhanom Ghebreyesus, Director-General of the World Health Organization (WHO), announced the naming of the unknown coronavirus pneumonia as "COVID-19" in Geneva, Switzerland [1] .On February 22, the National Health Commission (NHC) issued a notice that the English name of the novel coronavirus pneumonia was revised to COVID-19.On March 11, the WHO deemed the current outbreak of COVID-19 to be a global pandemic.
There have been some research reports on SIR Models used to deal with the spread and prevention of COVID-19.In 2020, Li Yingjie et al. studied the distribution model of medical materials based on PCA and K-Means clustering.They formulated a possible and reasonable distribution plan based on the actual situation, which has great reference value for the distributing materials in areas with a material shortage [2] .The combined model of the Davies-Bouldin index and the K-Means clustering method can also be extended to more fields, which is of great significance for dealing with big data and index classification.In 2021, Yuan Yingxiang et al. published a report titled "Modeling and Research of COVID-19" that concluded the impact of medical resources on the epidemic.Since some parameters will change during the entire development of the epidemic, this model cannot describe the changes in the entire process of the epidemic in China with a set of parameters so that it can be applied through the phased method [3] .In 2022, in the Research and Application of SIR Infectious disease Model in COVID-19 published by Chen Chuang et al., it was found that strengthening isolation control and improving medical and health intervention had a practical containment effect on hindering the spread of the epidemic and reducing casualties [4] .However, in practice, due to the development of various prevention and control measures and the improvement of the medical levels, the parameters are changed.Therefore, the dynamic parameters processing can be considered to simulate the epidemic trend and transmission law of the epidemic more truly.
By comparing various methods, we use the SIR Model to predict the epidemic in Xi'an in October by comparing the epidemic spread in Shanghai, Beijing, and Changchun in May 2022.From early October 4, 2022, Xinjiang's driver sling guest incident to October 5th emergency response control and the great sorting out data, it took just a short span of a day, so prompt action shows the government deeply learned the lessons of the new champions league against pneumonia, based on China on the prevention and control of unique crown pneumonia experience, epidemic prevention, and power to take emergency measures for the administration.Through extensive data screening and emergency isolation of infected people, and at the same time, compulsory isolation, wearing masks, maintaining daily spacing, and other measures in their travel areas, the target of controlling infected people, susceptible people, and immune people can be accurately predicted in a particular stage of the epidemic.
The model can simulate the entire process from the onset to the end of an infectious disease.The SIR Model based on the solution of differential equations can fit the curve more accurately according to the existing data, and the measures to prevent the spread of infectious disease can be obtained by using phase trajectory analysis.The theoretical basis is sufficient, the scope of application is comprehensive, and the operation is substantial.

Restatement and analysis of the problem 1.Restatement of the problem
Question 1: Analyze the fundamental laws and critical factors of the spread of the epidemic in Shanghai, Beijing, and Changchun, and predict the expected time node of eradicating the epidemic in Shanghai and Beijing.
Question 2: Give a reasonable medical material assignment plan with full consideration of the population, area, and economic development level of different regions.In this plan, it is necessary to fully consider the casualties and losses of non-confirmed cases caused by failure to receive prompt treatment.

Problem analysis
Analysis 1: Using Matlab to the cumulative number of new infections in Shanghai, Beijing, and Changchun data into the line fitting for the spread of the epidemic regularity, aiming at solving the problem of cleaning time predict the society level, the need to establish a prediction model, first consider using the multivariate nonlinear fitting, but given the problem belongs to the epidemic types, we optimize the original model, Use the SIR Model.
Analysis 2: Taking Shanghai as an example [5] , we found that the factors affecting the degree of material shortage in a region include the total population of the area, the number of confirmed COVID-19 cases, the number of hospitals in the area, the number of medical workers in the area, the number of susceptible people in the area, and regional GDP.For this problem, the principal component analysis method is used, and the cluster analysis method is used to conclude that the population, area, GDP, and the number of hospitals in different regions are selected as the research objects.A reasonable assignment scheme can be obtained.

Model assumptions
1. Assume that the total population base of the epidemic area remains unchanged.
2. Assume that each person has the same probability of contact with infected patients and infection rate.
3. Assume that patients who have been cured will not be infected with COVID-19.
4. It is assumed that no new trait mutation will occur in the virus in this outbreak.

Noun explanation
Zero cases in the community: all new infections are found in quarantine and control places, and there are no infected people in the community.
COVID-19: refers to pneumonia caused by the 2019 novel coronavirus infection      3, it can be seen that the law of epidemic spread is as follows: as the date goes by, the cumulative increase of the number of people in various places increases in the early stage.The spread of the epidemic is fast; According to the table of risk areas, the epidemic is more severe in cities, and the transmission distance is long.The number of

Existing cases in Beijing
The critical factors for the spread of the epidemic are as follows: (1) related to human immunity; (2) It is related to the season.Because the virus survives for a long time in a low-temperature environment, the spread of the epidemic may be intensified in winter; 3) Public health intervention, the more effective the intervention, the more conducive to the corresponding control of the spread of the epidemic.
(2) Establishment of the second sub-question model Among the traditional models of infectious diseases, there is another kind of model called the SIR Model.This model can roughly show the process from the onset to the end of an infectious disease.The core of the model is the differential equation.The factors considered in the model are the susceptible population (S), the infected population (I), and the displaced population (R).
These three quantities are all functions of time that can be expressed as ( ) Where is a unit of t set for us?

Solving the model
1) First, according to Hypothesis 1, where is a constant value.
2) It is assumed that the number of susceptible persons that a patient can infect is proportional to the total number of sensitive persons in the environment in a time unit, and the proportionality coefficient is set so that the number of persons infected by all patients in a time unit time is 3) At the time, the number of people removed from the infected people per unit of time is directly proportional to the number of patients, and the proportionality coefficient is, and the number of people removed per unit of time is Based on the assumptions of the above three conditions, we can obtain the mechanism of the change in its number, i.e.
a) The decline rate of susceptible individuals is: b) The growth rate of infected individuals is: c) The individual growth rate of removers is: The differential equation using the SIR Core [6] can be expressed as follows: 4, Fig. 5 is based on the geometric application of SIR Model [7] and drawing with Matlab: From the above Fig. 4, it can be predicted that Beijing will achieve zero social settlement between May 25 and May 27.The final forecast results are consistent with the actual situation.It can be predicted from the Fig. 5 that Shanghai will achieve zero community membership between May 21 and May 23.The final forecast results are consistent with the actual situation.

Model verification
Because of the current round of epidemic situation in Shaanxi, the above model is used to forecast.The first step is to collect the data of Shaanxi province from August 9 to October 16, the second step is to calculate the value of each parameter in the SIR Model, and the third step is to substitute it into the Matlab program, then the prediction results can be obtained.
The resulting image is shown below: It can be predicted from the above Fig.6 that the current round of epidemic in Shaanxi will be eliminated around November 15 [8] .

Model establishment
A. Principal component analysis: First of all, due to the significant differences between the various data indicators required by the model, we normalize the data to make the data in the same order of magnitude.
The standard deviation standardization method is adopted, namely: Since the above indicators may have specific correlations, to analyze the indicators affecting the degree of regional material shortage more accurately, the principal component analysis method is used to standardize the index data, replace the initial variables as little as possible.At the same time, the retained main components have the same information as the original data.
The principal component analysis steps are as follows: 1) Calculate the correlation coefficient matrix: -1 2 2) Calculate the eigenvalues and eigenvectors of the relevant data matrix, and combine m new index quantities from them 3) Calculate the information contribution rate and the cumulative contribution rate of eigenvalues.When the value of αp is close to 1, the first p indicator variables are selected as p principal components to replace the original four indicators.
Through the analysis, the principal components of the question are population, area, GDP, and the number of hospitals in different regions.These principal components are uncorrelated from each other and retain the basic information of the original data B. Determine the optimal number of clusters Determining the optimal number of clusters has a significant impact on the effectiveness of clustering, and the same clustering algorithm is used to evaluate the index goodness of the clustering results under different clustering conditions.According to the principle of the k-means clustering method, the Davies-Bouldin index is selected for evaluation.
The DB index is evaluated by describing the class divergence of the sample and the distance between the class centers, the smaller the DB, the lower the similarity between the classes, and the better the clustering effect.The definition is as follows:

Solving the model
To determine the optimal number of clusters, we should first give the range of K (the given range of K is 0-9), run the same clustering algorithm with different cluster numbers K on the data set, obtain a series of clustering results, and calculate the DB value of each cluster number, as shown in Fig. 7. Analyzing the results of Fig. 7, the optimal number of clusters was chosen as four categories.The pretreatment of data into the model, medical supplies scarce degree sequence was obtained by Matlab software from low to high is the Chongming, QingPu, Jinshan district, the district, the Pudong new area, Jiading district, Songjiang, Minhang district, Baoshan district, Shanghai's Changning district, Yangpu district, Putuo district, Jing 'a district, Xuhui district, Huangpu district, Hongkou district.
Through the K-Means clustering method, the material shortage degree of each area in Shanghai was divided into four levels, and the results are shown in Table .2. We should allocate more medical materials to the areas with a high material shortage.According to the above clustering results and the actual situation, the material distribution plan was formulated, as shown in Table .3 [9].

Conclusion
This paper selects the COVID-19 epidemic data of Shanghai, Beijing, and Changchun in May to establish the SIR Model that can accurately predict the epidemic.The model is applied to the current round of epidemic in Shaanxi in early October to expect when the epidemic will be cleared.The methods of principal component analysis and cluster analysis were used to distribute medical material in different areas scientifically.
For common infectious diseases, SIR Model was introduced to study the epidemic trend prediction and response strategy.Firstly, the public data of the epidemic in Shanghai, Beijing, and Changchun were collected for preliminary processing, appropriate relevant factors were selected, relevant programs were written, and the functional relations between susceptible population S, infected population I, recovered population R, and time t and their fitting images were obtained with the help of Matlab tools.Combined with the public epidemic data of Shanghai from March 20 to May 6, 2022, and the function image fitted by the model, it was predicted that the epidemic should be wholly eliminated on May 22, 2022, and the prediction results are consistent with the actual situation.The model was applied to the prediction of the epidemic situation in Shaanxi province this round, and the public data on the epidemic situation in Shaanxi province was found.After repeated calculations, predicting the epidemic situation in Shaanxi province can be realized, and the theoretical prediction of the epidemic situation in Shaanxi province should reach the expected effect around November 15, 2022; that is, the community will be eliminated.Finally, in this paper, by using the principal component analysis method and cluster analysis method, taking Shanghai as an example, according to the scarcity degree of medical materials, it can be divided into four levels: the first, the second, the third and the fourth, and the material distribution ratio under this classification principle is 40%, 30%, 20%, and 10% respectively.
By comparing the epidemic spread in Shanghai, Beijing, and Changchun in May 2022, the SIR Model is used to predict the epidemic situation in Shaanxi in October, and the principal component analysis method and cluster analysis method were used to make the optimal allocation of medical materials.However, there are still many shortcomings of the technique.It does not introduce a feedback mechanism and depends on model's assumption, and there is one-way transmission between the susceptible population S, the infected population I, and the recovered population R. Rely on the accuracy of initial data, but the process of raw data processing is cumbersome.The model has certain limitations, and it is relatively tricky to use differential equations to solve it.Moreover, the model is sensitive to the initial value, and the model does not introduce a feedback mechanism.In the prediction process, the accuracy will inevitably be reduced if the data of a long period, time in the future will be predicted solely based on the existing data.The improvement is introducing a variable as a feedback mechanism for this model.

3 .
Establishment and solution of the model 3.1.Model establishment and solution of Problem 1 4.1.1.Model establishment (1) The first question is about a model establishment We used Excel to sort out the large amount of data obtained and analyzed the cumulative number of infections in Shanghai, Beijing, and Changchun from February to May.We used Matlab to carry out multivariate nonlinearity, as shown in Fig. 1, Fig.2, and Fig.

Fig. 1 Fig. 2 Fig. 3
Fig. 1 Multivariate nonlinear fitting curve of the cumulative number of infections in Changchun[Self-drawing]

Fig. 4
Fig. 4 Forecast map of Beijing SIR Model [Self-drawing]

Fig. 5
Fig. 5 Forecast map of Shanghai SIR Model [Self-drawing]

Fig. 7
Fig. 7 Determination of the DB value

Table . 1
Main symbols and instructions

Table . 2
Grade division table of Shanghai districts

Table . 3
Percentage of scarcity by district in Shanghai