Comparison of neural networks and regression time series in estimating the Czech Republic and China trade balance

. Foreign trade has been and is considered to be very important. Trade balance measurement provides one of the best analyzes of a country's external economic relations. It serves as a monetary expression of economic transactions between a certain country and its foreign partners over a certain period. The aim of this paper is to compare the accuracy of time series alignment by means of regression analysis and neural networks on the example of the trade balance of the Czech Republic and the People's Republic of China. Trade balance data between the Czech Republic and the People's Republic of China is used. This is a monthly balance starting in 2000 and ending in July 2018. First, a linear regression is made followed by regression using artificial neural networks. A comparison of both methods at expert level and experience of the evaluator, the economist, is performed. Optically, the LOWESS curve appears to be best out of the linear regression and the fifth preserved RBF 1-24-1 network seems the mot appropriate out of neural networks.


Introduction
Foreign trades have always been considered to be of the utmost importance and have always been given great attention. The balance of payments is recognized as an instrument that provides one of the best analyzes of a country's external economic relations and serves as a monetary expression of economic transactions between a country and its foreign partners over a certain period [1].
We can define the balance of payments as a systematic statistical record that captures all economic transactions between domestic and foreign currencies for a certain period of time. It includes transactions related to the movement of goods, services and earnings, operations with financial receivables and payables and operations referred to as transfers [2]. The balance of payments consists of two basic accounts, a current account and a capital account [3]. The current account of the balance of payments reflects the movement of real resources (eg goods, services, transfers, etc.), the capital account mainly depicts the international

Data and methods
Data for analysis is available on the website of the World Bank etc. The data on the trade balance between the Czech Republic and the People's Republic of China will be used for the analysis. This will be the difference between total exports and imports between the two countries from the Czech point of view. The time period for which the data will be available is the monthly balance starting from January 2000 and ending in July 2018. This is about 223 input data. The unit is the Euro.
The descriptive characteristics of the data are given in Table 1. Interesting, of course, is price development over time. Figure 1 therefore shows the selected statistical characteristics in graphical form, including the histogram of the input data. For data processing, DELL's Statistica version 12 will be used. First, a linear regression will be performed. Afterwards, neural networks will be used for regression.
Linear regression will be performed on the monitored data sample for the following functions:  Linear,  Polynomial,  Logarithmic,  Exponential,  Multiparameter of weighed distances,  Multiparameter negative-exponential smoothing. First, the correlation coefficient, ie the dependence of trade balance on time, will be calculated. Next, we will work with a level of significance of 0.95. Then regression will be performed using neural structures. We will generate multilayer perceptron networks and neural networks of basic radial functions. Time will be the independent variable. We will determine the trade balance between the Czech Republic and the PRC as the dependent variable. We divide the time series into three setstraining, testing, and validation. The first group will have 70% of input data. Based on the training set of data, we generate neural structures. In the remaining two sets of data, we always leave 15% of the input information. Both groups will serve us to verify the reliability of the found neural structure, or the found model. The delay of the time series will be 1. We will generate 10,000 neural networks. We will preserve 5 of them with the best characteristics. We will be orientated using the smallest square method. We will terminate network generation if there is no improvement, ie to reduce the sum of squares. Thus, we will preserve those neural structures whose sum of squares of residue to the actual Czech Republic and PRC trade balance will be as low as possible (ideally zero). In the hidden layer, we will have at least two neurons, at most 50. In the case of the radial basic function, there will be at least 21 neurons in the hidden layer, at most 30. For the multiple perceptron network we will consider these distribution functions in the hidden layer and in the output layer:  Linear,  Logistic,  Atanh,  Exponential,  Sinus. Other settings are left by default (using the ANS toolautomated neural network).
In conclusion, we compare the results of linear regression and regression using neural networks. Comparison will not take place in the form of residue analysis (minimum, maximum values, spread of residues, etc.), but at expert level and experience of the assessor, economist.

Linear regression
The correlation coefficient is -0.8830, which represents a significant statistical indirect dependence of the trade balance on the development over time. The determination coeficient reaches 0.7797.
A point graph was constructed (see Figure 2), where the points were intersected by a regression curve, in this case linear. The line parameters are shown in the figure. Even in this case, the solid line represents a regression curve. Its shape resembles a straight line, a linear function. Even in this case, the trade balance of the partner countries is not fully represented. Figure 4 shows a point graph intersected by a logarithmic function.  The logarithmic function curve also closely resembles a straight line and therefore a linear function. From the location of individual points in the graph, it is obvious that the logarithmic function is not suitable for regression. Figure 5 provides a graph of the trade balance of the Czech Republic and the PRC intersected by the LOWESS function. The LOWESS (locally weighted scatterplot smoothing) function, or LOES (locally estimated scatterplot smoothing), is calculated as a regression function at partial intervals. The LOWESS curve aligns the time series and monitors its trend appropriately. The LOWESS curve intersection looks very interesting. It can fairly reveal the trend of the trade balance of the partner countries. The curve copies the development of trade balance over its entire interval better than linear functions, polynomial functions and logarithmic functions. Figure 7 provides the intersection of the function obtained by smallest squares of negative -exponential smoothing.  Even this curve appears to be interesting and appropriate for a possible prediction. It also has better results than linear, polynomial, and logarithmic.
The correlation coefficient indicates significant statistical indirect dependence of the target variable on time development. Its value is -0.8830. The determination coefficient is 0.7797. If we only evaluated the results by comparing the trade balance between the Czech Republic and the PRC and the shape of the regression curve and taking into account a simple linear regression, we could certainly say that the LOWESS curve is closest to the development. Following it are curves obtained by the smallest squares method, namely negative-exponential smoothing and weighed distances. All three copy the basic trade balance between the Czech Republic and the PRC. At the same time, it should be recognized that the other functions (linear functions, polynomial functions, and logarithmic functions) are not suitable for time series alignment. It will simplify its course too much.

Neural structures
Based on the established procedure, 10,000 neural networks were generated. Five networks have been preserved, the ones showing the best parameters. Their overview is given in Table 2. These are only the neural networks of the basic radial function. The input layer has only one variable -time. In the hidden layer, neural networks contain from 22 to 30 neurons. In the output layer, we have a single neuron and the only output variable is the export balance of the Czech Republic and the PRC. For all networks, the RBFT training algorithm was applied. In addition, all neural structures used the same function to activate the hidden layer of neurons, namely the Gaussian curve. They also use the same function to activate the outer layer of neurons, and this function is identity (see Table 2).
Training, testing and validation performance is also interesting. In general, we are looking for a network that has the same performance in all sets of data (we remind that the data was randomly distributed). The error should be as small as possible. As far as the error is concerned, it should be recalled that we are working on the balance expressed in euros, ie with a great deal of detail, which will also correspond to the above-calculated error.
The performance of individual sets of data is expressed as a correlation coefficient. The values of the individual data sets according to specific neural networks are presented in Table 3. The table shows that the performance of all stored neuronal structures is very high. Higher fluctuations in the performance of the individual datasets are shown in networks 2. RBF1-21-1 and 3. RBF 1-29-1. The correlation coefficient of all training data sets ranges from more than 0.89 to over 0.94. The value of the correlation coefficient of the testing data sets for all neural networks ranges from more than 0.85 to over 0.94. The correlation coefficient of the validation data set of all neural networks is above the 0.949 level. In order to select the most suitable neural structure, we need to analyze the results obtained. Table 4 provides the basic statistical characteristics of each set of data for all neural structures.  Ideally, the individual statistics of the neural network are cross-sectional in all sets (minimum, maximum, residue, etc.). However, in the case of the aligned time series, the differences are noticeable. They are apparent both between individual datasets of one neural network and between individual neural networks. Somewhat larger differences are reflected in the residue characteristics. However, we are not able to clearly identify which of the preserved neural networks has the most appropriate results. Figure 8 is a graph showing the actual development of the trade balance of the Czech Republic and the PRC, as well as the development of predictions using individual generated and preserved networks.
The chart shows that all neural networks predict the development of the trade balance at different intervals differently. However, the similarity between predictions of individual networks, but similarity (or degree of consistency) with actual trade balance developments is not important.

Conclusion
The aim of the paper was to compare the accuracy of time series alignment using regression analysis and neural networks on the example of the trade balance between the Czech Republic and the People's Republic of China. In general, each prediction is given by a certain degree of probability with which it is to be fulfilled. As we predict the future development of any variable, we try to estimate the future development of this variable on the basis of previous years' data. Although we can include most of the factors influencing the target variable in the model, we always simplify reality, and we always work with a certain degree of probability that some of the predicted scenarios will be fulfilled As with linear regression, as well as regression using neural networks, there is simplification -and quite substantial. We only work with two variablesinput (time) and output (trade balance Czech Republic x PRC). So, we completely ignore other input variables, which often have a significant impact on the trade balance between the Czech Republic and the PRC (international political situation, taxation in both countries, production factor prices, state export support, natural resources and many others). The main determinant is then the purpose for which we perform the calculation. If we want to estimate the future development of individual countries' economies based on the trade balance between the Czech Republic and the PRC, it is likely that such simplification will be sufficient and the calculation will have adequate information value. If, for example, we are planning to plan for the shipping capacity of a particular company, it will not be enough to use such simplified time series. It is still true that we estimate aggregated variables better than partial ones.
At the same time, we can state that due to the great simplification of reality, it is not possible to predict the emergence of extraordinary situations and their impact on the Czech Republic and PRC trade balance (perhaps in the short term, yes, but certainly not long term). Ideal would be a prediction for the next few days, but for such a short prediction it is currently not possible to obtain data.
The trade balance of both countries can be determined on the basis of statistical methods, causal methods and intuitive methods. In this case, we have been comparing statistical methods. However, they only gave us a possible framework for the development of the monitored variable. It is important to work with information on the possible future development of the economic, political or legal environment. If we can predict their development, we can then project it into the monitored trade balance. At the same time, however, the personality of the evaluator comes into playan economist who, on the basis of his knowledge and experience, corrects the price determined by the statistical methods and specified on the basis of causal links.
Optically, the LOWESS curve was the best out of linear regression, followed by the smallest squares curve by negative-exponential smoothing, and the curve obtained by the smallest square-weighed distance. From the neural networks, the network 5.RBF 1-24-1 proved to be the most usable in practice. If we look at performance from a correlation coefficient point of view, the neural network 5.RBF 1-24-1 remains usable. The aim of the paper was fulfilled.