Issue |
SHS Web Conf.
Volume 102, 2021
The 3rd ETLTC International Conference on Information and Communications Technology (ETLTC2021)
|
|
---|---|---|
Article Number | 04004 | |
Number of page(s) | 6 | |
Section | Applications in Computer Science | |
DOI | https://doi.org/10.1051/shsconf/202110204004 | |
Published online | 03 May 2021 |
A Contemporary Machine Learning Method for Accurate Prediction of Cervical Cancer
1
Department of Computer Science, Bayero University, Kano. Nigeria
2
University of Aizu, Japan
3
Department of Software Engineering, Bayero University, Kano. Nigeria
* e-mail: tanimujessej@gmail.com
** e-mail: hamada@u-aizu.ac.jp
*** e-mail: mhassan.se@buk.edu.ng
**** e-mail: syilu.cs@buk.edu.ng
With the advent of new technologies in the medical field, huge amounts of cancerous data have been collected and are readily accessible to the medical research community. Over the years, researchers have employed advanced data mining and machine learning techniques to develop better models that can analyze datasets to extract the conceived patterns, ideas, and hidden knowledge. The mined information can be used as a support in decision making for diagnostic processes. These techniques, while being able to predict future outcomes of certain diseases effectively, can discover and identify patterns and relationships between them from complex datasets. In this research, a predictive model for predicting the outcome of patients’ cervical cancer results has been developed, given risk patterns from individual medical records and preliminary screening tests. This work presents a Decision tree (DT) classification algorithm and shows the advantage of feature selection approaches in the prediction of cervical cancer using recursive feature elimination technique for dimensionality reduction for improving the accuracy, sensitivity, and specificity of the model. The dataset employed here suffers from missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features and SMOTETomek has better results with an accuracy of 98%, sensitivity of 100%, and specificity of 97%. Decision Tree classifier is shown to have excellent performance in handling classification assignment when the features are reduced, and the problem of imbalance class is addressed.
© The Authors, published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.