Use of the C4.5 Algorithm to Analyze Student Interest in Continuing to College

. Higher education is one of the goals of several options when a student completes education in high school. However, not all students wish to continue their education in college. There are many reasons behind why students do not continue their education in college. Several factors that influence in this research include self-motivation, scholarships provided, parental support, ease of getting a job and academic competition. This study took a sample of 25 student data that will be used to find the most influential factors in continuing education in higher education. This study uses a data mining classification approach using the c4.5 method in making a decision tree. The results that can be concluded, a student has a high desire to continue his education in college because it will make it easier for them to get a job and the factor in getting an education scholarship.


Introduction
The quality of human resources is seen as one of the key factors in the era of free trade. The higher competition and demands in the world of work also require quality human resources with all the competencies they have, able to develop themselves and together build the nation [1]. One way to improve the quality of human resources is through education [2].
Higher education is organized to prepare students to become members of the community who have academic and/or professional abilities who can apply, develop and/or create science, technology and/or art. Therefore, universities as educational units that organize higher education play a very important role in creating quality human resources, so that rapid global changes can be responded to by existing education products.
Given the difficulty of getting a job in the midst of competition in the wider community, it is very important for higher education to have sufficient abilities and skills as a provision to become a workforce [3]. The quality of a job when attending higher education is not only concerned with theory but also practice. In addition, it will be easier to get a job with a more and more decent position. In fact, many students do not want to continue their education to college. There are various threatening reasons. However, in this study we have conducted research in high school on factors that generally have an influence on students who continue their education in college and students who do not continue their education in college. Among them are parental support, self-* Corresponding author: lintang@unmus.ac.id motivation, scholarships, ease of getting a job and academic competition.
To solve this problem into deeper results, then, we use the concept of data mining. Data mining is able to detect factors that occur in the educational environment like analyze the factors influencing active commuting to school [4]. Next, we use the c4.5 algorithm to classify in extracting data and information. The c4.5 algorithm is able to provide optimal results and is able to provide a fairly high level of accuracy [5]. The c4.5 algorithm is able to predict well on training data and provide increased accuracy when using larger training data [6]. This study is expected to find the most influential and dominant factor for high school students that causes them to have an interest in continuing their education to college using the data set that we have collected at their school.

Research Methods
Decision tree is able to solve difficult problems in ideological education, where the research puts forward a decision tree optimization algorithm for ideological education for decision making using classification techniques on cloud services. The experimental results in this study are to verify the effectiveness of the decision tree algorithm. The paper is able to carry out an empirical analysis for the ideological education of students based on the level of orientation [7]. The decision tree method is one of the most popular classification methods. This method is included in the concept of machine learning. Decision tree has also been successfully implemented into an intelligent m-learning system, so that it can provide an effective learning system where this method plays a role in improving students' learning abilities, so that it also has an impact on improving students' academic performance. [8].
Decision trees in the world of education can also be used to evaluate the form of scholarships in universities. The provision of this scholarship is very important because it is aimed at students who work hard to achieve achievements both at home and abroad. The use of this method is able to evaluate fair and efficient scholarships [9]. The classification method in the machine learning concept can predict student performance in the learning process [10]. This classification is associated based on the talent of students in the academic field with certain skills to improve their achievement. This analysis helps educational institutions to avoid failure in educating students. This study predicts student performance in a course which is also related to student and teacher data. The c4.5 method can classify and predict the solution of the problem accurately [11].
The method suggested in this paper to predict the most influential factors on students' interest in continuing their education to college, uses a combination of Data Mining and Decision Tree Learning. There are four main stages in this method: Data Collection, Classification, Predictive Modeling and Evaluation [12].

Decision Tree
Decision tree is a well-known predictive model and can be applied to a number of different fields [13]. In general, decision trees are built using an algorithmic approach so that they can be identified for data collection in various conditions [14]. This method is one of the most widely used in terms of supervised learning. This method is a non-parametric supervised learning method, which is used for classification assignments and regression assignments. Decision rules are generally in the form of 'if-then-else' statements. The deeper the decision tree, the more complex the rules [15].
The decision tree has a usable node where we can choose a suitable attribute to ask a question and its leaves represent the actual class label. Decision tree is used for non-linear decision making. Decision trees classify by sorting them down to the root and to each of its leaves, which leaves give the appropriate classification. Each node in the tree serves as a test case representing some attribute, and each edge descending from that node corresponds to one of the possible answers to the test case. Decision trees can be easily converted into classification rules. Decision tree used in data mining [16], there are two types: a) Classification tree, which is used to predict. b) regression, which is used to predict and generate real processed values findings.

C4.5 Algorithm
The C4.5 approach is a form of development of the ID3 algorithm proposed by Ross Quinlan [17]. The C4.5 algorithm is used as the basis for selecting attributes to carry out the testing process in the classification concept. The steps using the C4.5 algorithm to calculate student interest predictions are as follows.
Step 1: Calculate the initial entropy for sample distribution. Step 2 ：Calculate entropy for any test property.
: Case Collection : Feature : Number of partitions : Proportion of to 1) Prepare training data. Training data is usually taken from historical data has happened before or is called past data and has been grouped in certain classes; 2) Counting the roots of the tree. The root will be taken from the attribute to be selected, with how to calculate the gain value of each attribute, the highest gain value which will be the first root. Before calculating the gain value of the attribute, calculate first the entropy value; 3) Calculate the Gain value; 4) Repeat step 2 and step 3 until all records are partitioned; 5) The decision tree partitioning process will stop when: (a) All records in node N have the same class. (b) There are no attributes in the partitioned record anymore. There are no records in the empty branch.

Data Selection and Data Preparation
Variables that affect students' interest in continuing their studies to college will then be processed using data mining techniques. From the existing data, attributes or labels will then be given to determine the final decision. The variables used in the implementation of the decision tree are academic competition, parental support, selfmotivation, ease of getting a job and get scholarship.

Data Preprocessing
Data preprocessing is applied to add the contents of missing or inappropriate data attributes, as well as changing data inconsistencies. The steps needed for data preprocessing include, (1) the data is transformed into a suitable form for carrying out the data mining process, (2) performing data reduction to eliminate unnecessary attributes. Table 1 shows the data that has been repaired and is ready to be computed.

Turning Data Into Trees
In converting the data into a tree, the data is first expressed in the form of a table with attributes and records. Attribute states a parameter that is created as a criterion in the formation of the tree.

.Determining the Entropy Value and Gain Value
In the sample data, the first thing to do is to calculate the entropy values in all tables to find the first node : From the results of the calculation of the search for the 1st node, the ease of getting job is obtained as the attribute with the largest gain value. Then the ease of getting job will be used as the root node in the next search. This calculation continues until all the attributes in all branches have the same class by selecting the attribute based on the highest "gain" value of the existing attribute. Where the calculation results for node 1 are as shown in Table 2. After all the calculation processes to determine branching have been completed, a decision tree is generated as shown in Figure 1 below, Fig.1 Decision tree to analyze the most influential student factors for continuing education.

Description
The results of the classification of the processed data sets are as follows:

Conclusions
The decision tree is a system developed to help find and make decisions for a problem and by taking into account various factors that exist within the scope of the problem. With a decision tree, humans can easily identify and see the relationship between the factors that affect a problem and can find the best solution by taking into account these factors. The results of the study using the decision tree concept with the c4.5 method show that from a total of 25 datasets it can be concluded that the attributes of easy work and scholarship attributes have a great influence on a student to continue to college level.