Naïve Bayes for Analysis of Student Learning Achievement

. Student achievement is measured by the achievement index value obtained every semester, student achievement is measured by several factors, and in this research the author takes several factors including study paths, choice of majors, monthly living expenses, relationships with friends, relationships with family, motivation study, employment, scholarships, transportation, and internet services. Analysis and prediction of student achievement using Naïve Bayes Algorithm classification method, the result is this algorithm works very well using 14 student datasets to determine the grades of the 15 th student. Based on theAnalysis, variables that affect student achievement include choice of majors, residence, relationships with friends, relationships with family, job, and scholarships. The accuracy of the naïve bayes algorithm for this student achievement case study model reaches 60%, precision 25%, and recall 100%.


Introduction
In college education, students are required to be able to achievers and become students competing to get good academic achievement, and the achievement index is a benchmark in that regard [1].
The scope of achievement index (IP) is the average credit score of students or the number of student success rates in participating in the teaching and learning process during each semester. If the student achievement index is high, then it identifies that the student is able to attend lectures properly and correctly. But, if the achievement index is low, it means that the student is not able to attend college well. The benefits of a good achievement index are that these students can get scholarships, students can also take undergraduate courses faster because they can take more courses, and students can easily get jobs when they graduate [2].
Based on the literature, there are several writings that say student success in teaching and learning is gender, place of birth, student residence, socioeconomic level, and aspects of nature including basic abilities, attitudes and appearance [3]. Meanwhile, students have learning motivation influenced by intrinsic factor, quality of lectures, weight of lecture material, lecture methods, condition and atmosphere of lecture halls, and library facilities [4], other articles also state that report cards, National Examination scores, entry paths, choice of majors, place of residence, study methods, monthly living costs, student relationships with friends, student ralationships with family, and motivation to study are important factors in influencing student achievement index [1]. * Corresponding author: nurlela@unmus.ac.id From some of the factors that have been mentioned previously, the authors take several variables are college entrance path, major choice, residence, study method, monthly living expenses, relationship with friends, relationship with family, motivation to learn, job, scholarship, transport, internet service as material for analysing student achievement in this research. This research aims to analyse student achievement using the Naïve Bayes Algorithm.
Naïve bayes is one of the classification algorithms in data mining to predict a class label. Data mining functions to search for knowledge in a database to find valid, useful, and understandable data patterns to be used as knowledge [5]. Usually, knowledge is obtained from experts in a particular field and adapted into a computer program to make decision and provide information from reasoning [6]. The data used must also be of quality data and information obtained from procedures such as collection, maintenance, dissemination and good regulation of data [7]. Data mining consists of several groups, are Description, Estimation, Prediction, Classification, Clustering, and Association [8]. Classification is a method to find a model that describes and distinguishes the class of a data concept. The model will be obtained from the analysis of the traing dataset and will be used to predict the class label of an object whose class is not known [9]. Naïve bayes as a classification algorithm that has high accuracy and speed will be used in the classification of achievement and analysis of factors that affect student achievement.
Naïve bayes is one of the classification algorithms, where the classification method is a method to see the behaviour of the grouped variable attributes. Naïve bayes comes from Bayes Theorem which means that the attributes or variables are independent. Naïve bayes is an algorithm that has accuracy and it fast in managing large databases. Naïve bayes utilizes training data to obtain probabilities of attributes or variables that can be used to predict classes in a classification case. Naïve bayes works by looking at the frequency of each classification in the training data and looking for the greatest opportunity from the possible classifications. The advantage of this algorithm is that naïve bayes can work in classification even though it uses a small amount of training data [10].
Naïve bayes is also considered to have good potential in classifying documents compared to other classification methods in terms of accuracy and computational efficiency. Naïve bayes performs a classification by calculating a simple probabilistic based on the number of frequencies and combinations of values from the dataset. Naïve bayes predicts future opportunities based on past experiences In the naïve bayes algorithm, the dataset must be equipped with an output value or label, so that Naïve Bayes observes the probability of each attribute to determine the output value or label for the dataset to be classified [11].
Equation of naïve bayes algorithm [12]: In this research, the authors used 14 student data for computer education at the University of Musamus as training and testing data. The training data is 9 data, and the testing data is 5 data and will be analysed using the Rapid Miner Application. Students will be classified based on Bad Class and Good Class. Bad class means a class that contains students who have a IPK below 3, while good class means a class that contains students who have a IPK of 3 and above.
The variables or attributes used in this study are as follows: In addition, as material for predicting student achievement, 14 student achievement datasets will be used to predict the 15 th student achievement data that does not yet have a class label.

Literature Review
Based on the naïve bayes analysis for prediction of the 14 th student achievement data, the probability value is good = 3.47 and the probability value is bad = 0.  Because the probability good value is greater than the probability bad, the 15 th student is included in the good class label.
Based on the analysis using Rapid Miner for each variable, on the variable of the college Entrance Path, more students are in bad class when their college entrance path is independent.

Fig.3 college Entrance Path analysis
While in the Major Choice variable, it can be seen that students who choose computer education as the second choice are students who have good achievements:

Fig.4 Major Choice Analysis
Based on the place of residence, students who live in boarding houses are more accomplished.    Based on the motivation to learn variable, students who have moderate learning motivation are also able to get good achievements and IPK scores. Based on the variable studying while working, it can be seen that students who study not while working have higher grades than students who study while working.

Fig.11 Job analysis
Based on figure 12, many students who get scholarships are more accomplished than students who don't get scholarships:  In the analysis of the last variable is internet service, students must be supported by good internet services and at least have a personal quota to get achievements. The following is the result of calculating the accuracy of the Naïve Bayes Algorithm on the student achievement dataset model.

Conclusion
The result shows the naïve bayes algorithm can be used to predict student achievement based on learning the student achievement dataset. Analysis of variables that affect student achievement has not been fully obtained because the student dataset is small and requires more analysis related to the variables that really affect student achievement according to needs, location of case studies and more in-depth analysis of each student. Based on the research, it was found the variables that affect student achievement include the choice of majors, residence, relationships with friends, relationship with family, job, and scholarships. The accuracy of the naïve bayes algorithm for the case study model of this student achievement reached 60%, precision 25%, and recall 100%.