Impact of oversampling algorithms in the classification of Guillain-Barré syndrome main subtypes

Main Article Content

Oscar Chávez-Bosquez https://orcid.org/0000-0002-0324-9886
Manuel Torres-Vásquez https://orcid.org/0000-0001-8475-0914
José Hernández-Torruco https://orcid.org/0000-0003-3146-9349
Betania Hernández-Ocaña https://orcid.org/0000-0001-5700-7615

Abstract

Guillain-Barré Syndrome (GBS) is a neurological disorder where the body’s immune system attacks the peripheral nervous system. This disease evolves rapidly and is the most frequent cause of paralysis of the body. There are four variants of GBS: Acute Inflammatory Demyelinating Polyneuropathy, Acute Motor Axonal Neuropathy, Acute Sensory Axial Neuropathy, and Miller-Fisher Syndrome. Identifying the GBS subtype that the patient has is decisive because the treatment is different for each subtype. The objective of this study was to determine which oversampling algorithm improves classifier performance. In addition, to determine whether balancing the data improves the performance of the predictive models. Three oversampling methods (ROS, SMOTE, and ADASYN) were applied to the minority class. Three classifiers (C4.5, SVM and JRip) were used. The performance of the models was obtained using the ROC curve. Results show that balancing the dataset improves the performance of the predictive models. The SMOTE Algorithm was the best balancing method, in combination with the classifier JRip for OVO and the classifier C4.5 for OVA.
Abstract 350 | PDF (Español (España)) Downloads 218 PDF Downloads 63

References

[1] P. A. van Doorn, “Guillain-Barré syndrome,” in Dysimmune Neuropathies. Elsevier, 2020, pp. 5–29. [Online]. Available: https://doi.org/10.1016/B978-0-12-814572-2.00002-9
[2] A. Tellería-Díaz and D. Calzada-Sierra, “Síndrome de Guillain-Barré,” Revista de Neurología, vol. 34, no. 10, pp. 966–976, 2002. [Online]. Available: https://doi.org/10.33588/rn.3410.2001280
[3] E. Alpaydin, Introduction to Machine Learning. MIT press, 2020. [Online]. Available: https://bit.ly/2HvdROG
[4] J. A. Cruz and D. S. Wishart, “Applications of Machine Learning in cancer prediction and prognosis,” Cancer Informatics, vol. 2, p. 117693510600200, jan 2006. [Online]. Available: https://doi.org/10.1177/117693510600200030
[5] A. R. Vaka, B. Soni, and S. R. K., “Breast cancer detection by leveraging Machine Learning,” ICT Express, may 2020. [Online]. Available: https://doi.org/10.1016/j.icte.2020.04.009
[6] H. Kaur and V. Kumari, “Predictive modelling and analytics for diabetes using a machine learning approach,” Applied Computing and Informatics, dec 2018. [Online]. Available: https://doi.org/10.1016/j.aci.2018.12.004
[7] N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning classification methods,” Procedia Computer Science, vol. 167, pp. 706–716, 2020. [Online]. Available: https://doi.org/10.1016/j.procs.2020.03.336
[8] Z. K. Senturk, “Early diagnosis of parkinson’s disease using machine learning algorithms,” Medical Hypotheses, vol. 138, p. 109603, may 2020. [Online]. Available: https://doi.org/10.1016/j.mehy.2020.109603
[9] A. Khan and S. Zubair, “An improved multimodal based Machine Learning approach for the prognosis of Alzheimer’s disease,” Journal of King Saud University - Computer and Information Sciences, apr 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2020.04.004
[10] A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. Springer International Publishing, 2018. [Online]. Available: https://doi.org/10.1007/978-3-319-98074-4
[11] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, may 2017. [Online]. Available: https://doi.org/10.1016/j.eswa.2016.12.035
[12] A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, apr 2018. [Online]. Available: https://doi.org/10.1613/jair.1.11192
[13] K. Napierala and J. Stefanowski, “Types of minority class examples and their influence on learning classifiers from imbalanced data,” Journal of Intelligent Information Systems, vol. 46, no. 3, pp. 563–597, jul 2015. [Online]. Available: https://doi.org/10.1007/s10844-015-0368-1
[14] J. Canul-Reich, J. Frausto-Solís, and J. Hernández-Torruco, “A predictive model for Guillain-Barré syndrome based on single learning algorithms,” Computational and Mathematical Methods in Medicine, vol. 2017, pp. 1–9, 2017. [Online]. Available: https://doi.org/10.1155/2017/8424198
[15] J. Canul-Reich, J. Hernández-Torruco, O. Chávez-Bosquez, and B. Hernández-Ocaña, “A predictive model for Guillain-Barré syndrome based on ensemble methods,” Computational Intelligence and Neuroscience, vol. 2018, pp. 1–10, 2018. [Online]. Available: https://doi.org/10.1155/2018/1576927
[16] J. Hernández-Torruco, J. Canul-Reich, J. Frausto-Solís, and J. J. Méndez-Castillo, “Feature selection for better identification of subtypes of Guillain-Barré syndrome,” Computational and Mathematical Methods in Medicine, vol. 2014, pp. 1–9, 2014. [Online]. Available: https://doi.org/10.1155/2014/432109
[17] A. Fernández, S. del Río, N. V. Chawla, and F. Herrera, “An insight into imbalanced big data classification: Outcomes and challenges,” Complex & Intelligent Systems, vol. 3, no. 2, pp. 105–120, 2017. [Online]. Available: https://doi.org/10.1007/s40747-017-0037-9
[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, jun 2002. [Online]. Available: https://doi.org/10.1613/jair.953
[19] H. He, Y. Bai, E. A. García, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). IEEE, jun 2008. [Online]. Available: https://doi.org/10.1109/IJCNN.2008.4633969
[20] S. Ruggieri, “Efficient C4.5 [classification algorithm],” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 438–444, 2002. [Online]. Available: https://doi.org/10.1109/69.991727
[21] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support Vector Machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000. [Online]. Available: https://doi.org/10.1093/bioinformatics/16.10.906
[22] A. Rajput, R. P. Aharwal, M. Dubey, S. Saxena, and M. Raghuvanshi, “J48 and JRip rules for e-governance data,” International Journal of Computer Science and Security (IJCSS), vol. 5, no. 2, p. 201, 2011. [Online]. Available: https://bit.ly/3jt2jrY
[23] R. Kannan and V. Vasanthi, “Machine learning algorithms with ROC curve for predicting and diagnosing the heart disease,” in Soft Computing and Medical Bioinformatics. Springer Singapore, jun 2018, pp. 63–72. [Online]. Available: https://doi.org/10.1007/978-981-13-0059-2_8
[24] A. Fernández, V. López, M. Galar, M. J. del Jesús, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowledge-Based Systems, vol. 42, pp. 97–110, apr 2013. [Online]. Available: https://doi.org/10.1016/j.knosys.2013.01.018
[25] A. D. Pozzolo, O. Caelen, and G. Bontempi, unbalanced: Racing for Unbalanced Methods Selection, 2015, R package version 2.0. [Online]. Available: https://doi.org/10.1007/978-3-642-41278-3_4
[26] L. Torgo, Data Mining with R, learning with case studies. Chapman and Hall/CRC, 2010. [Online]. Available: https://bit.ly/3jtkeyV
[27] P. Branco, R. P. Ribeiro, and L. Torgo, “UBL: an R package for utility-based learning,” CoRR, vol. abs/1604.08079, 2016. [Online]. Available: https://bit.ly/35yeFtU
[28] I. H. Witten, E. Frank, M. A. Hall, and C. Pañ, Data Mining, Practical Machine Learning Tools and Techniques, Elsevier, Ed. Morgan Kaufmann, 2017. [Online]. Available: https://doi.org/10.1145/507338.507355
[29] D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch, e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, 2018, R package version 1.7-0. [Online]. Available: https://bit.ly/3mm1d3s
[30] A. S. Hussein, T. Li, W. Y. Chubato, and K. Bashir, “A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE,” International Journal of Computational Intelligence Systems, 2019. [Online]. Available: https://bit.ly/3mhotiT