Machine learning methods application for consumer banking

. Machine learning (ML) methods are e ﬀ ective tools for analysis of many actual problems in modern banking. Increasing growth of data and rapid digitalization underpin the acceleration of ML implementation. These processes are especially noticeable in consumer banking because banks have millions of the retail customers. The ﬁrst goal of our research is to form an extended review ML application in consumer banking. From one side we have identiﬁed the most developed ML methods, which are applied in this segment (for ex-ample di ﬀ erent types of regressions, fuzzy clustering, neural network, principal component analysis etc.). From the other side, we point out two multi-purpose tools used by banks in consumer segment intensively, namely scoring and clustering. Secondly, our goal is to present some innovative applications of ML methods to the analysis of each task. This includes several applications for scoring models and fuzzy clustering application. All applications are oriented to make banks business processes more e ﬀ ective. Considered applications were realised on real data from the Ukrainian banking industry.


Introduction
Machine Learning (ML) is a dynamically growing class of methods that has many successful applications [1,2]. One of the areas of productive ML using is modern banking and more widely modern financial institutions. The basic reason for such use arises from digitalization and intensively implementing online technologies in the financial sphere. These processes generate Big Data that can be involved in ML processing. Data handling triggers for further development of existing methods and offers new ideas. We want to emphasize some directions in banking (especially in lending banking) in which ML methods were productively used and perspectives of development are fruitful.
The first direction is a concern to generalized scoring methodology. This methodology is highly implemented in different business processes in banking: estimation creditworthiness of borrowers (credit scoring), identifying potentially profitable customers (marketing scoring), anti-fraud systems (fraud scoring), and so on. Almost all ML methods were used for scoring (first of all credit scoring) construction. Today's package of scoring construction methods includes Multivariate Adaptive Regression Splines, Support Vector Machine (SVM), k-nearestneighbors method, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and, of course, Artificial Neural Network (ANN). In general, ANN maybe now one of the most popular methods in scoring building tools. We should highlight paper [3] where presented comprehensive results of the comparison of ML credit scorings with classical expert-based scorings. All types of scoring are generalized for involving new (online) data. It should be complemented that ordering banks them-selves also getting here.
The second direction where ML methods were multiskilled implemented in banking is clustering. Modern banks operate with millions of customers. Each customer can be characterized by vectors with hundreds, maybe a thousand components (customer characteristics). Applying ML arises strong possibilities for clustering customers, identification of their behavior and, as consequence, improved productivity of banking services. When a bank wants to realize clustering, it needs to identify "hidden knowledge" that will help to divide customers effectively into a set of clusters. Here dominates the concept of K-Means algorithms. It involves k-mean, improved kmean, k-medians, applied hierarchical clustering. We design fuzzy clustering in this paper. Of course, the task of clustering banks themselves is also treated by such an ML method.
The third direction is the cybersecurity of banks. This direction ever more actual through moving to online interaction with customers. The logic of applying ML methods for cybersecurity tasks may be based on Gartner's PPDR model [4]. This model points out five categories: prediction, prevention, detection, response, and monitoring. A good overview of cyber-risk and cyber-security problematics for financial institutions can be found in [5]. The spectrum of applying ML in this sphere is described here [6].
We tried to systematize methods of ML in the context of the above mentioned directions. The first two considered spheres have economic nature and the third is more technical (but of course concerns economic consequences!). Our focus was concentrated on the first two spheres which have economic nature and are closely connected with business processes.
Part 2 is devoted to the literature review and describes the application of the most effective methods. Part 3 contains some illustrations of applying ML, which was partially presented in our researches. The conclusion involves a point of view for development.

Materials and methods
Machine learning is the concept that a computer program can learn and adapt to new data without human interference [7]. Machine learning is a field of artificial intelligence that keeps a computer's built-in algorithms current regardless of changes in the worldwide economy. The application of machine learning methods in economics and banking discussed in researches [8][9][10][11]. Machine learning includes the next classes of methods: Supervised Learning, Unsupervised Learning, Reinforcement Learning, Ensemble methods, Neural Networks and Deep Learning (table 1).
Regression. Logistic regression is a classification method that constructs a binary variable result prediction (1/0, True / False, Yes / No, Good / Bad) for a given set of independent variables.
When constructing logistic regression use dummy variables. The maximum likelihood estimation method is used to estimate logistic regression parameters. Logistic regression calculates the probability of Y when realizing certain values of χ. Log-likelihood is the sum of the probabilities associated with the predicted and actual values of Y. Deviations in logistic regression have a distribution of χ 2 . Performance indicators for the logistic regression model: Akaike Information Criteria; Null Deviance and Residual Deviance; the error matrix as a tabular representation of actual and predicted values; ROC curve (Receiver Operating Characteristic).
Fuzzy clustering. Fuzzy clustering is a class of algorithms in which the distribution of data points for clustering is not "clear" (but supposes "fuzzy). It is used in the Neural-fuzzy systems to determine fuzzy sets if they are unknown a priori. Fuzzy sets are like projections of clusters on each dimension. An a priori knowledge and cluster analysis combination lets to refine the parameters of the membership function. The disadvantage of this method of determining fuzzy sets is the complexity of their interpretation.
One of the most commonly used methods is the fuzzy c-means (FCM) method. It assigns a fuzzy membership value to each object, based on its distance to the cluster centers. The membership of the data point in that cluster will be higher than its membership in the other clusters if the data point is closer to the center of the cluster.
The FCM method is an iterative procedure of sequentially improving a certain fuzzy initial partition userdefined or automatically generated by a specific heuristic rule. According to the algorithm, on each iteration values of the membership functions of fuzzy clusters and their typical representatives are recursively listed. The FCM method will terminate when a specified a priori finite number of iterations is performed, or when the minimum absolute difference between the values of the membership functions on two consecutive iterations does not fall below some a priori setpoints.
Neural network technologies. Artificial Neural Networks (ANN) are widely used in finance and insurance problems. ANN using is credit scoring construction to good effect. Many different methods are applied in this direction. They are based on the representative types for supervised and unsupervised ANN. The main advantage of ANNs is that dependency between variables does not necessary to characterize. The quality of the ANN applying in credit scoring application can be explained by the Big Data of modern banks.

Parametric scoring model based on the concept of survival
The classic scoring model is based on the linear function Z = a 0 + a 1 x 1 + a 2 x 2 + · · · + a n x n (1) where x 1 , x 2 , ..., x n -borrowers characteristics; a 1 , a 2 , ..., a n are the weights of characteristics that reflect their significance. Such models have been implemented at all stages of relationships between borrowers and banks in the consumer lending segment. The illustration presents at figure 1.
Our practice indicates that all these scorings in differing degrees have been applied in Ukrainian consumer banking. This models actively enriches by new data from digitalization and online lending last five years. The data from customer behavior on websites, cells using types, and other data can be included in scorings.
One of the typical characteristics of above-mentioned scoring model is a static state. Below we want to present specific approach that includes dynamic consideration. The logic lies in inclusion time parameter t in scoring coefficients. This model will be "dynamic credit scoring model": P(i) is the probability that the i-th customer (borrower) will be "Good" in the period from 0 to t; x i j is the j-th characteristic of the i-th customer, where i = 1, . . . k, j = 1, . . . n; a j (t) is the coefficient of the model at time t, where j = 1, . . . n; t is a time for estimation. Model (2) differentiate customers much more accurately and qualitatively because it will show the likelihood of not just going out of time but getting out of time at some point in time. This will make it possible to build the financial model more accurately. In addition, if you apply this model to application scoring, it is possible to implement a specific strategy for working with customer (including monitoring, reminder system and other).
Let us illustrate this approach to collection scoring construction. This model was elaborated on Ukrainian data and applied to good effect for collection business processes [43]. Model was constructed on data pool of 50,000 debtors. The data include more than 30 characteristics, including the type and status of credit accounts, a partial history of payments to them, social and demographic parameters of debtors, information about their education and employment. Among the many factors, nine non-correlated indicators were identified, with a significant effect on the dependent variable (table 2).
Statistical significance was verified using χ 2 statistics, the Cramer coefficient, and the Information Value. The dependent variable assumed a value of 1 if the debtor made at least 1 payment during 8 quarters period. Value 0 was  The effect of the characteristics influence (table 3) on the probability of debt payment in the dynamics is estimated. At each time moment t, "Information Value" metrics were calculated for each quarter. The probability of the event "Good" occurrence was taken as the dependent variable -the debtor made the first payment within a time period from 0 to t inclusive, (the alternative event "Bad" did not occur), where the parameter t in the model (2) denotes the quarter from 1 to 8, in which this event happened.
Debtors were then assigned scoring points on a scale of 0 to 100. Each group created scoring points: 0-20; 20-40; 40-60; 60-80; 80-100 corresponds to the probability P(i) that the i-th debtor will make his payment in the period from 0 to t, which depends on both the score and the time.
In all models, the normalized R-square -correlation coefficient -close to 1, so the relationship in the models close. Each variable is significant and the model is adequate according to Student's and Fisher's criteria. The magnitude of all coefficients decreases with each quarter, except for the constant that compensates for this decrease.
The basic mathematical differences are time changing in weights. This can be used to collection strategies development while optimizing the costs of implementing it.

Neural network technologies in debt portfolio management
The ANNs usage provides an efficient classification of bank debtors into groups. Debtors are similar to each other in terms of risk characteristics in each group. The economic logic of this is the management of debt collecting strategies applying. Neural networks allow the classification of debtors from the training sample into groups of a more complex geometric shape than the classical linear discriminant function and visualize it. The most valuable property of ANNs is the ability to learn from multiple examples in cases where patterns are unknown and relationships between input and output are not obvious. In such cases, both traditional statistical and expert methods are ineffective.
Our research of the consumer debtors portfolios involves applying ANNs methodology to estimate debtors. The study founds effectiveness of problem structuring: − the contact problem with debtors; − the debtors' insolvency.
The first problem is raised from the fact that a significant part of debtors is non-contact. This does not allow the application of soft-charging techniques and leads to highvalue direct contact or using legal procedures, which are also costly. The practice has shown that the proportion of non-contact debtors approximately 70-80%.
The second problem is that some of the contact debtors refuse to pay for various reasons: lack of funds, unwillingness to pay due to high-interest penalties, etc. The implementation model in practice has shown that 40-50% are contact debtors. The ANN was applying twofold as pictured in the scheme (figure 2).
Based on the AANs using two different scorings were constructed: contact scoring and solvency scoring.
In doing so emphasis is put to such characteristics as socio-demographic (age, marital status, educational level, a region of residence, etc.), professional (employment status, vocational qualification, etc.), loan parameters (amount, interest rate, duration, etc.) and, of course, characteristics of overdue [44].
In both cases, scoring is the first step in applying Self Organizing Maps (SOM) or Kohonen cards to the training sample. Kohonen map is a special type of neural network that allows identifying hidden structures and patterns through learning neural networks. A special algorithm performs clustering based on two-dimensional visualization.
Neural networks technology creates a series of clusters which includes homogeneous debtors.
When using Kohonen maps, there is a problem of choosing between detail and visualization. The increasing details in one of these characteristics lead to the deterioration of the other. Really, more detailed consideration complicates economic analysis leads to the difficulty of visualization. On another side, reducing it can lead to the loss of important patterns. Our analysis showed that in considered cases it is suitable to divide from 6 to 9 clusters. The authors' experience shows that in most cases, splitting into 7 clusters is optimal.
The next question is concerned with correlation analysis of the debtor's characteristics. To avoid correlation problems and to include non-dependent characteristics, it is advisable to select only those with a correlation coefficient not exceeding 0.6. Another way concerns applying principal component analysis (PCA). However, the use of PCA, in this case, maybe difficult due to the complexity of their economic interpretation. After correlation analysis, the optimal levels of influence (significance) of characteristics are determined with the help of classification trees to improve model accuracy. Our analysis of different consumer loan debt portfolios leads to identify the most relevant contact characteristics (table 4).
The obtained scoring allows you to sort the debtors by the level of contact probability: the higher the scoring value, the more likely they are to contact him. Based on contact scoring, the following logic for managing arrears may be suggested. Before working remotely with a new portfolio of debtors, the probability of making contact is assessed by considered scoring. All debtors are allocated by three scoring classes: high-contact, medium-low and low-contact. After the introduction of remote work with the portfolio, the debtors are already factually divided into contact and non-contact. As a consequence, all contact information is being worked on further in debt collection. At the same time, debtors with a high value of contact scoring but no contact, in reality, should be continue elaborated for identifying contacts. It is logically not spending the time for those non-contracted debtors who have low contact scoring. Namely, for non-contact debtors with low contact scoring values, the following strategies may be employed: − write-off strategy if the debt is not large; − a strategy for obtaining additional information through a request to the credit bureaus. If there is a significant amount of debt, put a low priority on further work with his debtor. If there are no so much open loans, give high priority; − transfer to legislative recovery, if the amount of debt is considerable.
Applying an approach based on these strategies creates frameworks for seeking optimal allocational resources. Summing up, we can conclude that ANNs are an effective technique for elaborating strategies of prioritizing collection efforts.

Fuzzy clustering of bank's consumer loan portfolio
The intensive development of consumer lending last decades leads to the fact that today banks possess hundreds of thousands or millions of borrowers in their credit portfolios. These are really Big Data. Credit portfolios involve different segments: mortgages, car loans, unsecured consumer loans, credit cards, and others. One of the crucial objectives consists of clustering borrowers. Typically, this corresponds to lead-generation. The objective is to find different clusters of borrowers who successfully paid previous loans. Such different clusters include borrowers with different characteristics, different behaviors, and preferences. So, it is logical to construct a corresponding marketing strategy. The clustering can be done by various approaches which involve choosing economic parameters for clustering basis and mathematical techniques. Our research in clustering large credit portfolios of banks leads to forming an approach to applying ML for clustering. The first component of our approach corresponds to the identification of basic economic indicators for clustering. The second component is the fuzzy clustering application.
First component realization. We have chosen three indicators at the framework of the first component. They are some unification of risk and profitability indicators. By our logic, these indicators can be calculated for borrowers which have closed loans. The bank is planning to propose some lead-generating product. These indicators are: − score of credit bureau(s); − amount of loans which borrower was granted; − level of overpayment for previous loans.
The economic logic of such indicators is the following. The score of the credit bureau provides information about the risk level of the borrower. A lower score value (high risk) identifies a "bad" borrower which is not paid or overloaded. It may be logical to exclude such borrowers from further consideration or apply a special approach constructed for the corresponding cluster. A high score value (low risk) identifies a "good" borrower who paying off his/her loans. It will be a nice borrower for lead generation but they are not high profitably. Really, such borrower pays "accurately and timely". The average score value corresponds to the borrower which periodically hits in overdue but then pays all debt with a penalty, fees, and others. In reality, it is more profitable. Of course, it is average consideration. Different types may be in this category.
The second indicator is the loan amount. The basic economic logic here: low amount generates low profit for the bank but more often overpayment.
Third our indicator reflects the profitability of the borrower. We were following conceptually approach of Storbacka [45] for separate customers and provided extension it for borrowers. According to our approach borrowers can be divided into four classes: A -borrowers with high overpayments, B -borrowers which pay "accurately and timely", C -there was some payments but the amount of the loan was not paid, D -FPD (First payment default and no any payments).
Of course, A-type is more profitable. D, in any case, should be excluded.
Second component realization. The abovementioned estimations of borrowers arise tasks for clustering. This clustering should consider the risk of insolvency, profit from an overpayment, and the amount of loan. Clustering is economically significant because it is possible to construct some (marketing) strategy for each cluster. The consideration classical C-means approach leads to some "strict" separation. Very often it may be not optimal for customers. Because some borrowers can "fuzzy" belong to different clusters with different values of membership functions. So, we applied fuzzy C-means clustering. The conceptual difference between approaches is illustrated below figure 3.
The economic benefits are construction more effective lead-generating strategies for such customers. Really, the classical C-mean clustering will be 3 clusters (if we specify 3), but fuzzy C-mean clustering provides 7 clusters. This allows forming a more adequate set of strategies.
So, fuzzy clustering leads to a more advanced approach for cluster creation which adequate forming strategies of lead-generation.

Conclusion
Machine Learning has been successfully developing in many spheres. Sometimes, the applications are impressive. Modern banking really excellent "test site" for different ML methods and techniques. This is mainly due to Big Data and understanding the practical importance of applying ML. Really, the average retail bank typically has an intensive inflow of customers and very many customers in the portfolio. Each customer as from inflow as from portfolio can be characterized by hundreds of indicators. This is Big Data with hidden patterns of customer behavior and preferences.
Tools for business development in consumer banking are scorings and clustering. It is absolutely logical to apply ML and AI for the solution of objectives to construct effective scorings assessment and clustering procedures. The results are multifaced. Our paper illustrated a couple of solutions for consumer banking based on Machine learning applications. All these solutions were implemented and indicate effectiveness. The huge growth of data according to digitalization in banking and developing fintech produce new objectives and new spheres for applying ML and AI.
Machine learning methods are self-developing areas of researches with the synergetic interaction between them. This fact has been considered in the mentioned examples.