An Efficient Predictive Model for Myocardial Infarction Using Cost-sensitive J48 Model
Abstract
Background: Myocardial infarction (MI) occurs due to heart muscle death that costs like human life, which is higher than the treatment costs. This study aimed to present an MI prediction model using classification data mining methods, which consider the imbalance nature of the problem.
Methods: We enrolled 455 healthy and 295 myocardial infarction cases of visitors to Shahid Madani Specialized Hospital, Khorramabad, Iran, in 2015. Then, a hybrid feature selection method included Weight by Relief and Genetic algorithm applied on the dataset to select the best features. After selection of the features, the metacost classifier applied on the sampled dataset. Metacost made a cost sensitive J48 model by assigning different costs ratios for misclassified cases; include 1:10, 1:50, 1:100, 1:150 and 1:200.
Results: After applying the model on the imbalanced dataset, the cost ratio 1:200 led to the best results in comparison to not using feature selection and cost sensitive model. The model achieved sensitivity, F-measure and accuracy of 86.67%, 80% and 82.67%, respectively.
Conclusion: Experiments on the real dataset showed that using the cost-sensitive method along with the hybrid feature selection method improved model performance. Therefore, the model considered a reliable Myocardial Infarction prediction model.
Longo D, Fauci A, Kasper D, Hauser S, Jameson J, Loscalzo J (2011). Harrison's principles of internal medicine. 18th ed. McGraw-Hill Education, New York, pp: 1798-2035.
Hall J, Guyton A (2015). Guyton and Hall textbook of medical physiology. 13th ed. Elsevier, Philadelphia, PA, pp.: 109-152.
Mohamadpoor T, Nabavinia MS, Gholoobi A, Alavi MS, Meshkat Z (2012). Enteroviruses in Acute Myocardial Infarction. Iran J Public Health, 41(8):71-74.
Oliver MF, Opie LH (2014). Management of acute myocardial infarction. Lancet, 383(9915):409-410.
Patel V, Upaganlawar A, Zalawadia R, Balaraman R (2010). Cardioprotective effect of melatonin against isoproterenol induced myocardial infarction in rats: a biochemical, electrocardiographic and histoarchitectural evaluation. Eur J Pharmacol, 644(1): 160-168.
Harper K, Armelagos, G (2010). The changing disease-scape in the third epidemiological transition. Int J Environ Res Public Health, 7(2): 675-697.
Gorunescu F (2011). Data Mining: Concepts, models and techniques (Vol. 12). Springer-Verlag Berlin Heidelberg, pp.: 45-56.
Han J, Jian P, Kamber M (2012) Data mining: concepts and techniques. Elsevier, Haryana, India, pp.: 84-370.
Krishnaiah V, Narsimha G, Subhash N (2015). Heart Disease Prediction System Using Data Mining Technique by Fuzzy K-NN Approach. In: Emerging ICT for Bridging the Future - Proceedings of the 49th Annual Convention of the Computer Society of India (CSI). Eds, Satapathy, Raju, Govardhan and Mandal. 1st ed, Springer International Publishing, pp.: 371-384.
Bashir S, Qamar U, Khan F, Javed M (2014). MV5: A Clinical Decision Support Framework for Heart Disease Prediction Using Majority Vote Based Classifier Ensemble. Arab J Sci Eng, 39(11): 7771-7783.
Bashir S, Qamar U, Khan F (2015). BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting. Australas Phys Eng Sci Med, 38(2):305-323.
Kumar, Sahoo G (2015). Classification of Heart Disease Using Naïve Bayes and Genetic Algorithm. In: Computational Intelligence in Data Mining. Eds, Jain, Behera, Mandal and Mohapatra. 1st ed. Springer India, pp.: 269-282.
Bashir S, Qamar U, Khan F (2016). A Multicriteria Weighted Vote-Based Classifier Ensemble for Heart Disease Prediction. Computational Intelligence, 32(4): 615-645.
Masetic Z, Subasi A (2016). Congestive heart failure detection using random forest classifier. Comput Methods Programs Biomed, 130:54-64.
Baxt W, Shofer F, Sites F, Hollander J (2002). A neural computational aid to the diagnosis of acute myocardial infarction. Ann Emerg Med, 39(4):366-373.
Karaolis M, Moutiris J, Pattichis C (2008). Assessment of the risk of coronary heart event based on data mining. In: 8th IEEE International Conference on BioInformatics and BioEngineering, 2008. BIBE 2008. IEEE, Athens, pp.: 1-5.
Srinivas K, Rani B, Govrdhan A (2010). Applications of data mining techniques in healthcare and prediction of heart attacks. IJCSE, 2(2):250-255.
Srinivas K, Raghavendra Rao G, Govardhan A (2010). Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques. In: 5th International Conference on Computer Science and Education (ICCSE), IEEE, Hefei, pp.: 1344 – 1349.
Masethe D, Masethe A (2014). Prediction of heart disease using classification algorithms. In: World Congress on Engineering and Computer Science (WCECS 2014), San Francisco, USA, pp.: 22-24.
Bhaskar N (2015). Performance Analysis of Support Vector Machine and Neural Net-works in Detection of Myocardial Infarction. Procedia Comput Sci, 46:20-30.
Sharma L, Tripathy R, Dandapat S (2015). Multiscale Energy and Eigenspace Approach to Detection and Localization of Myocardial Infarction. IEEE Trans Biomed Eng, 62:1827-1837.
Sun Y, Kamel M, Wong A, Wang Y (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn, 40(12):3358-3378.
Tallón-Ballesteros A, Hervás-Martínez C, Ri-quelme J, Ruiz R (2013). Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing, 114:107-117.
Inbarani H, Azar A, Jothi G (2014). Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed, 113(1):175-185.
Uğuz H (2011). A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst, 24(7):1024-1032.
Inbarani H, Bagyamathi M, Azar A (2015). A novel hybrid feature selection method based on rough set and improved harmony search. Neural Comput Appl, 26(8):1859-1880.
Tan K, Yu Q, Heng C, Lee T (2003). Evolutionary computing for knowledge discovery in medical diagnosis. Artif Intell Med, 27(2):129-154.
Min S, Lee J, Han I (2006). Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Syst Appl, 31(3):652-660.
Pashaiasl M, Khodadadi K, Kayvanjoo AH, Pashaei-asl R, Ebrahimie E, Ebrahimi M (2016). Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem re-peats characteristics. Gene, 578(2):194-204.
Ramya M, Lokesh V, Manjunath T, Hegadi R (2015). A Predictive Model Construction for Mulberry Crop Productivity. Procedia Comput Sci, 45:156-165.
Wang B, Japkowicz N (2009). Boosting support vector machines for imbalanced datasets. Knowl Inf Syst, 25(1):1-20.
Sammut C, Webb G (2011). Encyclopedia of machine learning. Springer, New York, pp.:231-233.
Elkan C (2001). The Foundations of Cost-sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Seattle, WA, pp.: 973-978.
Japkowicz N, Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal, 6(5): 429–450.
Domingos P (1999). MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164.
Hu Y, Feng B, Mo X, Zhang X, Ngai EW, Fan M, Liu M (2015). Cost-sensitive and ensemble-based prediction model for outsourced software project risk prediction. Decis Support Syst, 72:11-23.
King M, Abrahams A, Ragsdale C (2015). Ensemble learning methods for pay-per-click campaign management. Expert Syst Appl, 42(10):4818-4829.
Daliri M (2012). A Hybrid Automatic System for the Diagnosis of Lung Cancer Based on Genetic Algorithm and Fuzzy Extreme Learning Machines. J Med Syst, 36(2):1001-1005.
Gaziano TA, Bitton A, Anand S, Abrahams-Gessel S, Murphy A (2010). Growing Epidemic of Coronary Heart Disease in Low- and Middle-Income Countries. Curr Probl Cardiol, 35(2):72-115.
Zhao H (2007). Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst, 15(3):321-334.
Park Y, Chun S, Kim B (2011). Cost-sensitive case-based reasoning using a genetic algorithm: Application to medical diagnosis. Artif Intell Med, 51(2):133-145.
Files | ||
Issue | Vol 46 No 5 (2017) | |
Section | Original Article(s) | |
Keywords | ||
Myocardial infarction Heart disease Metacost Cost-sensitive J48 Weight by relief |
Rights and permissions | |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |