Implementation of the Naive Bayes Algorithm to Predict the Safety of Heart Failure Patients

Heart disease stands as a prominent contributor to global mortality, as indicated by data released by the World Health Organization (WHO). In 2019 alone, an estimated 17.9 million individuals succumbed to cardiovascular disease, accounting for 32% of all worldwide deaths. Of these fatalities, 85% were attributed to heart disease and stroke. Individuals harboring the potential for heart failure often persist in unhealthy lifestyles, regardless of their awareness of underlying heart conditions. To address this issue, the research explores the application of machine learning to identify an optimal method for classifying heart failure patients, employing the Naive Bayes technique. This algorithm has found extensive use in the health sector, demonstrating success in classifying various conditions such as hepatitis, stroke, respiratory infections, and more. The Naive Bayes algorithm, applied in this study, exhibited notable accuracy, precision, sensitivity, and overall classification efficacy. Specifically, the classification accuracy for heart failure patients reached 74.58%, the precision level was 97.67%, sensitivity achieved 75%, and the AUC (Area Under ROC Curve) stood at 0.857, indicating excellent classification within the 0.80 to 0.90 range. These findings can serve as an early warning system for individuals at risk of heart failure.


Introduction
Heart failure stands as a critical health concern, posing the highest mortality rate in both developed and developing nations.The global impact is staggering, with an estimated 17.9 million individuals succumbing to cardiovascular disease in 2019 alone, accounting for a staggering 32% of all deaths worldwide [1] [2].The predominant contributors to this alarming statistic are heart attacks and strokes, collectively constituting 85% of cardiovascular-related fatalities.It is imperative to note that the terminal phase of any heart attack is characterized by heart failure, a condition wherein the heart loses its capacity to efficiently pump blood to meet the body's physiological demands [3].
In Indonesia itself, based on the results of the Basic Health Research Research Agency for Health Development of the Indonesian Ministry of Health [4], it was reported that the estimate of heart failure sufferers based on a doctor's diagnosis was estimated at 0.13%, or 229,696 people, while based on the diagnosis and symptoms, it was estimated by 0.3% or 530,068 people [5].The total number of heart failure patients in 2013 was 759,764.When compared with the results of Riskesdas in 2018, it was reported that estimates of heart disease sufferers based on doctor's diagnoses among residents of all ages in each province reached 1.5%, or as many as 1,017,290 people.Compared to the results of Riskesdas in 2013 and 2018, the number of people with heart disease has increased by 33.89% [6].Seeing how high the number of people with heart disease is and the increasing mortality rate globally and in Indonesia raises an important question: "How can we turn past patient clinical data into useful information to support health practitioners' decisions in treating heart failure patients?".In numerous healthcare settings, information systems predominantly serve operational functions like inpatient billing and inventory management, lacking a focus on decision-support capabilities for patient care [7].The conventional approach relies heavily on the clinical expertise of experienced physicians, often sidelining valuable data within databases that could serve as a rich source of information when effectively harnessed through data mining techniques.Notably, researcher Robert Wu advocates for a paradigm shift by suggesting the integration of decision-support information systems with historical patient records.This integration [8] holds the potential to mitigate medical treatment errors, enhance patient safety, reduce practice-related mistakes, and facilitate overall patient outcomes.

IICS SEMNASTIK
Recognizing the significance of predictive analysis based on historical data, the Naive Bayes algorithm emerges as a favorable choice for its ability to anticipate future opportunities with minimal training data requirements for estimating essential parameters in the classification process [9], [10].This unique advantage prompts researchers to adopt the Naive Bayes data mining method as the focal point of this study.By leveraging this method, the study aims to bridge the existing gap in healthcare information systems, emphasizing the potential for enhanced decision-making, reduced errors, and, ultimately, improved patient care [11], [12].
To provide a comprehensive perspective on the research methodology, an exploration of data mining techniques beyond the Naive Bayes algorithm has been incorporated for comparative analysis.In a noteworthy study focused on detecting heart disease, the Nearest Neighbor Classification (K-NN) algorithm was employed.Utilizing a dataset distinct from the one used in this study, the research applied the K-NN algorithm with K set to 9. The outcomes revealed an accuracy rate of 70%, showcasing the algorithm's effectiveness in identifying heart disease patients.Additionally, the Area Under the Curve (AUC) value stood at 0.875, indicating excellent classification performance in heart disease detection [13], [14].
This research employed the K-NN algorithm in the context of heart disease detection, and the ensuing results will be juxtaposed with the findings derived from the Naive Bayes algorithm in the present study [15][16] [17].By incorporating diverse data mining techniques for comparison, the study aims to discern the strengths and limitations of each algorithm, offering a nuanced understanding of their applicability in healthcare decision support [18].This approach enhances the robustness of the research findings and contributes to a more thorough evaluation of the chosen Naive Bayes algorithm in the specific context of heart failure prediction.There is also research on heart disease prediction using several data mining techniques and datasets different from this study [19].The data mining techniques used are the Naive Bayes, Decision Tree, and Neural Network methods.The three data mining models can answer complex questions and provide detailed information [20].However, Naive Bayes manages to answer four of the five research objectives, whereas the Decision Tree only manages to answer three, and the Neural Network only manages to answer two [21].

Research Method
The framework that was created to become a reference and guideline for this research activity [22] is: The dataset used in this study is data on heart failure patients from the Faisalabad Institute of Cardiology and Allied Hospital in Faisalabad from April 2015 to December 2015 [23].The data collected consisted of 299 patient data with 13 attribute columns where 203 heart failure patients lived and 96 patients who did not survive [24].Of the 13 attribute columns in this dataset, the researcher adjusted the attributes used in this study so that there were 12 attributes left, which can be seen in Table 1 If the patient died during the follow-up period?(Safe/Unsafe) 2. Data Preprocessing, which checks the dataset columns to be used, such as ensuring the completeness of data attributes and removing outliers (data that are significantly different from other data) [25].3. Data Transformation, where the data has been cleaned through a classification process, is the first stage of calculation using the Naive Bayes method [26][27].4. Application of Naive Bayes: After the data is classified, it is divided into training and testing data.
As for the percentage distribution used by researchers, it is as much as 80% used as training data (240 out of 299 data), and as much as 20% used as data testing (59 out of 299 data) [28].The workflow for the Naive Bayes method is as follows: a. Calculate the initial probability of each class of events b.Calculating the probability of a detailed attribute in a class.c.Multiplies all class attributes by the occurrence class.d.Comparing results between classes.The application of Naive Bayes is carried out on all data testing to determine the probability of the final result of the data, then placing them into existing classes (the classes in this study are "Safe" or "Unsafe").5. Analysis of Results After getting the predicted results from Naive Bayes calculations with data testing in the "Safe" and "Unsafe" classes.Thus, the researchers measured the accuracy of the survival prediction in heart failure patients [29].

Data Collection Results
The Faisalabad Institute of Cardiology and Allied Hospital dataset in Faisalabad from April 2015 to December 2015 has 299 patient data with 13 variable columns [30].Still, this study only used 12 variable columns.

Research Results
The dataset then goes through the Data Preprocessing process, where the data is checked against the column to be used and ensures the completeness of the data variables.After that, a data transformation are classified first (transformation) so they can be processed using the Naïve data mining method Bayes.After producing the prediction results, they are tested using the accuracy measurement method, the Confusion Matrix, to make the results' accuracy, precision, and sensitivity.
2. The results of measuring the level of accuracy using the Confusion Matrix for predicting the safety of heart failure patients resulted in an accuracy rate of 74.58%, a precision level of 97.67%, and a sensitivity level of 75% with the distribution of training data by 80% and data testing by 20 % of a total of 299 heart failure patient data.
3. The prediction results obtained from applying the Naïve Bayes method can be used as supporting material for health practitioners in testing patient safety but are not allowed to be the final reference for decisions.If the patient's prediction results fall into the "Deceased" class, it is immediately recommended to be checked and receive exceptional medical treatment.
Of course, this research and calculation still require further development to obtain maximum results, so the author has some suggestions as follows: 1. Can use several variations in the distribution of training data and data testing in addition to the distribution of 80% training data and 20% data testing carried out by researchers to explore the optimal percentage of data sharing when calculated using the Naive Bayes data mining method.
2. Researchers recommend using data mining methods other than Naïve Bayes because not all problems have to be solved with one data mining algorithm.Therefore, explore other data mining methods and compare them to determine the most accurate algorithm.

Table 1 .
. Heart Failure Patient Dataset Attributes