233

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

COMPARATIVE ANALYSIS OF SUPERVISED MACHINE

LEARNING ALGORITHMS FOR HEART DISEASE

DETECTION

Hector Daniel Huapaya

Member of the articial intelligence research group of the faculty of systems engineering, department of

software engineering at the National University Mayor de San Marcos, Lima, (Perú).

E-mail: hector.huapaya@unmsm.edu.pe ORCID: https://orcid.org/0000-0003-3616-9046

Ciro Rodriguez

Professor at the School of Software Engineering at the National University Mayor de San Marcos, Lima,

(Perú).

E-mail: crodriguezro@unmsm.edu.pe ORCID: https://orcid.org/0000-0003-2112-1349

Doris Esenarro

Professor at the Faculty of Environmental Engineering and Graduate School of the National University

Federico Villarreal, Lima, (Perú).

E-mail: desenarro@unfv.edu.pe ORCID: https://orcid.org/0000-0002-7186-9614

Recepción:

27/12/2019

Aceptación:

30/03/2020

Publicación:

30/04/2020

Citación sugerida Suggested citation

Hector Daniel Huapaya, H. D., Rodriguez, C., y Esenarro, D. (2020). Comparative analysis of

supervised machine learning algorithms for heart disease detection. 3C Tecnología. Glosas de innovación

aplicadas a la pyme. Edición Especial, Abril 2020, 233-247. http://doi.org/10.17993/3ctecno.2020.

specialissue5.233-247

234

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

ABSTRACT

This paper describes the most prominent algorithms of Supervised Machine Learning

(SML), their characteristics, and comparatives in the way of treating data. The Heart

Disease dataset obtained from Kaggle was used to determine and test its highest percentage

of accuracy. To achieve the objective, Python sklearn libraries were used to implement the

selected algorithms, evaluate and determine which algorithm is the one that obtains the best

results, applying decision tree algorithms achieved the best prediction results.

KEYWORDS

Supervised machine learning, Heart disease, Decision tree algorithms, Prediction.

235

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

1. INTRODUCTION

Machine learning is one of the fastest-growing areas of computer science (Srivastava et al.,

2014), with long-range applications, which refers to the automatic detection of signicant

patterns in data with machine learning tools, which give programs the ability to learn and

adapt.

Machine learning has become one of the pillars of information technology and, with that,

a reasonably central, though generally hidden, part of our life. With the increasing amount

of data available, there is a good reason to believe that intelligent data analysis will be even

more widespread as a necessary ingredient for technological progress.

There are several applications for Machine Learning (ML), being one of the most important

data mining (Bustamante, Rodríguez, & Esenarro, 2019). The handling of a large amount

of data makes people more likely to make mistakes during analyzes or, possibly, when trying

to establish relationships between multiple characteristics.

Data mining and machine learning go hand in hand with which several ideas can be

derived through appropriate learning algorithms. There has been signicant progress in

data mining and machine learning as a result of the evolution of nanotechnology, which

generated curiosity to nd hidden patterns in the data to obtain results. The fusion of math

and statistics, machine learning and articial intelligence, information theory and big data,

and hight processing computation, has created a reliable science, with a rm mathematical

base and compelling tools.

This paper focuses on the classication of ML algorithms and the determination of the

most ecient algorithm with the best accuracy and precision. In addition to establishing the

performance of dierent algorithms in large and small datasets with one view, classify them

correctly, and provide information on how to build supervised machine learning models.

236

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

2. CONCEPTUAL FRAMEWORK

2.1. CLASSIFICATION OF SUPERVISED LEARNING ALGORITHMS

Supervised machine learning algorithms deal more with the classication of data that

includes the following algorithms: Linear Classiers, Logistic Regression, Naive Bayes

Classier, Perceptron, Support Vector Machine; Quadratic classiers, K-Means grouping,

Reinforcement, Decision Tree, Random Forest (RF); Neural networks, Bayesian networks.

1) Linear Classiers: Linear models for classication separate input vectors into classes

using linear decision limits (hyperplane). The objective of linear classiers in machine

learning is to group elements that have similar characteristic values into groups (Ray,

2018). A linear classier achieves this objective by making a classication decision

based on the value of the linear combination of the characteristics. A linear classier

is often used in situations where classication speed is a problem since it is classied as

the fastest classier. Besides, linear classiers often work very well when the number of

dimensions is signicant, as in the classication of documents, where each element is

typically the number of counts of a word in a report. However, the rate of convergence

between the variables in the data set depends on the margin. In general terms, the

margin quanties how linearly separable a collection of data is and, therefore, how

easy it is to solve a given classication problem.

2) Naive Bayesian Networks: These are elementary Bayesian networks that are

composed of acyclic graphs directed with a single parent (representing the unobserved

node) and several children (corresponding to the observed nodes) with a strong

assumption of independence between nodes children in the context of their father.

Thus, the independence model (Naive Bayes) is based on the estimate. Bayes classiers

tend to be less accurate than other more sophisticated learning algorithms (such as

Articial Neural Networks). However, in a large-scale comparison of the Bayes naive

classier with state-of-the-art algorithms for decision tree induction, instance-based

learning and rule induction in standard reference data sets, and discovered that it is

sometimes superior to the other learning schemes, even in data sets with dependencies

of substantial characteristics. The Bayes classier has an attribute independence

problem that was addressed with the average estimators of a dependence.

237

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

3) Support Vector Machines: This is the most recent supervised machine learning

supervised technique. Support vector machine models (SVM) are closely related to

classical multilayer perceptron neural networks. SVMs revolve around the notion

of a “margin” on each side of a hyperplane that separates two kinds of data. It has

been shown that maximizing the margin and, therefore, creating the most signicant

possible distance between the separation hyperplane and the instances on each side

thereof reduces an upper limit on the expected generalization error.

4) K-means: It is one of the simplest unsupervised learning algorithms that solve the

known clustering problem. The procedure follows a simple and straightforward way to

classify a given set of data through a certain number of groups (suppose k groups) set

a priori. The K-Means algorithm is used when tagged data is not available (Bhavsar

& Ganatra, 2012). General method of conversion approximate general rules into a

highly accurate prediction rule. Given the “weak” learning algorithm that you can

consistently nd classiers (“general rules”) at least slightly better than random, say

55% accuracy, with sucient data, a reinforcing algorithm can build a single classier

with very high precision, say 99%.

5) Decree Tree: Decision trees (DT) are trees that classify instances by ordering them

according to characteristic values. Each node in a decision tree represents a characteristic

in an example that will be organized, and each branch represents a value that the node

can assume. Instances are arranged from the root node and are sorted based on their

characteristic values. The decision tree learning, used in data mining and machine

learning, uses a decision tree as a predictive model that assigns observations on an

element to conclusions about the objective value element.

6) Neural Networks: They can perform several regressions and classication tasks at

the same time, although commonly, each network performs only one (Sethi et al., 2019).

Therefore, in the vast majority of cases, the network will have a single output variable.

However, in the case of classication problems of many states, this may correspond

to several output units (the post-processing stage is responsible for the assignment of

output units to output variables) (Mureșan & Oltean, 2018).

238

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

2.2. CHARACTERISTICS OF MACHINE LEARNING ALGORITHMS

Supervised machine learning techniques are applicable in numerous domains. In general,

Support Vector Machines and neural networks tend to work much better when it comes

to multidimensional and continuous features (Agarwal & Sagar, 2019). On the other hand,

logic-based systems tend to work better when it comes to discrete/categorical features. For

neural network models and Support Vector Machines, the large sample size is required to

achieve maximum prediction accuracy, while Bayesian networks may need a relatively small

data set.

There is a general agreement that the K nearest neighbor algorithm is very sensitive to

irrelevant characteristics: this characteristic can be explained by the way the algorithm

works. Besides, the presence of irrelevant characteristics can make the training of the

neural network very inecient, even impractical. The most decision tree algorithms cannot

work well with problems that require diagonal partitions (Sathya & Abraham, 2013). The

division of the instance space is orthogonal to the axis of a variable and parallel to all other

axes. Therefore, the resulting regions after separation are all hyper-angles. Articial neural

networks and support vector machines work well when multicollinearity is present, and

there is a non-linear relationship between the input and output characteristics.

Naive Bayes (NB) requires little storage space during the training and classication stages:

the strict minimum is the memory needed to store prior and conditional probabilities. The

basic kNN algorithm uses a large amount of storage space for the training phase (Cao et al.,

2019), and its execution space is at least as ample as its training space. On the contrary, for

all non-lazy learners, the execution space is usually much smaller than the training space,

since the resulting classier is often a very condensed summary of the data. Besides, Naive

Bayes and CNN can easily be used as incremental learners, while rule algorithms cannot.

Naive Bayes is naturally robust to missing values since these are ignored in the probabilities

of calculation and, therefore, have no impact on the nal decision. On the contrary, kNN

and neural networks require complete records to do their job.

Finally, the decision trees and NB generally have dierent operational proles, when one

is very precise, and the other is not, and vice versa. In contrast, decision trees and rule

classiers have a similar operational prole. SVM and ANN also have a similar operational

239

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

prole. No single learning algorithm can uniformly outperform other algorithms in all data

sets.

Dierent data sets with dierent types of variables and the number of instances determine

the kind of algorithm that will work well (Manzoor & Singla, 2019). There is no single

learning algorithm that exceeds other algorithms based on all data sets according to the

free lunch theorem. The following table presents a comparative analysis of several learning

algorithms.

3. METHODOLOGY

The methodology to determine the best-supervised algorithm applied in the heart disease

dataset will begin with the interpretation of the data, the preprocessing of the data, and the

application of the algorithms to determine the best accuracy.

A. Dataset

The dataset used for this research will be “Heart Disease” which was found in the Kaggle

repository, this database contains 76 attributes, but all published experiments refer to the

use of a subset of 14 of them. In particular, the Cleveland database is the only one that ML

researchers have used to date. The “goal” eld refers to the presence of heart disease in

the patient. It has an integer value of 0 (no presence) to 4. Experiments with the Cleveland

database have concentrated on the simple attempt to distinguish presence (values 1, 2, 3, 4)

from absence (value 0) (Ray, 2018; Sethi et al., 2019; Agarwal & Sagar, 2019).

B. Interpretation of the data

Next, the data extracted is interpreted from the empirically chosen database.

Figure 1. Type of chest pain.

240

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

Figure 2. Resting blood pressure.

Figure 3. Serum cholesterol.

Figure 4. Fasting blood sugar.

Figure 5. Maximum heart rate reached.

241

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

From the visualization of gures 1,2,3,4, and 5 by category is possible to observe how the

data are expressed, which makes it possible to detect if there is a probability of heart disease.

C. Application of algorithms

After understanding the data and interpreting the information to be generated, the following

algorithms will be applied.

1. K Nearest Neighbors (KNN)

Because the KNN algorithm classier predicts the class of a given test observation by

identifying the observations that are closest to it, the scale of the variables is essential.

Any variable that has a large scale will have a much more signicant eect on the distance

between the observations than the variables that are on a small scale, and therefore on the

KNN classier (Sethi et al., 2019; Agarwal & Sagar, 2019; Cao et al., 2019; Manzoor &

Singla, 2019).

After determining the training and test data with the preprocessing processes, let’s use the

elbow method to choose a good value of K.

Figure 6. Error rate vs. K-value.

Here we can see the error rate after applying K = 13, let’s re-enter the model with this data,

and this information is reached.

1. Decision trees: The data is divided into a training set and a test set, then a single

decision tree will be trained, using the sklearn library, to evaluate the created decision

tree.

242

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

2. Random Forest: The data is preprocessed, and the training and test variables are

separated to train the model.

3. Neural Network: The sklearn library will be used to preprocess the data to prepare

for training.

4. Support Vector Machines: The data is preprocessed to apply the algorithm, the

training and test variables are separated; we train the model using the sklearn library.

5. RESULTS

After applying the selected supervised learning algorithms to the dataset chosen for

comparison, the following algorithm results are obtained.

A. K Nearest Neighbors (KNN)

To evaluate the model test data was used to nd the confusion matrix, with which we can

calculate the accuracy, precision, recall, and f1-score metrics, the following information is

available:

Table 1. Result of applying the KNN algorithm.

Table 1 shows the average weight as 0.91, and the accuracy formula that is the sum of the

real positives with the true negatives among the total population is applied, an accuracy of

45,614 is reached, and confusion matrix as :

243

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

B. Decision Trees

Applying the decision tree, we get the following results.

Table 2. Result of applying the Decision Trees algorithm.

Table 2 shows the average weight as 0.85, and confusion matrix as:

C. Random Forest

We evaluate the random forest model according to the data already preprocessed and

trained with several estimates of 100.

Table 3. Result of applying the Random Forest algorithm.

It has an average weight of 0.81, and the confusion matrix as:

244

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

D. Neural Network

Training and test data are separated, to train the model using Keras dataset, then the model

will be evaluated. Figures 7 and 8 show the models.

Figure 7. Loss model. Figure 8. Accuracy model.

Table 4. Result of applying Neural Network algorithm.

Table 4 shows the weight average accuracy obtained of 0.81.

The following confusion and information matrix are obtained:

E. Support Vector Machines (SVM)

The model will be evaluated according to the preprocessed data, and the following is

obtained, and the report classication and matrix are:

245

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

Table 5. Result of applying Support Vector Machines algorithm.

Table 5 shows the weighted average accuracy of 0.85.

6. CONCLUSION

As was observed in the results, the model of k nearest neighbors has obtained better results

in precision with an average accuracy of 0.91 for the heart disease dataset. For future work,

other types of classication or segmentation can be applied to achieve a better prediction

of the chosen dataset.

ACKNOWLEDGMENTS

This paper has been possible to carry out as research due to the need to obtain and generate

knowledge from dierent professionals. The authors wish to thank our university mentors

for their support and guidance.

REFERENCES

Agarwal, R., & Sagar, P. (2019). A Comparative Study of Supervised Machine Learning

Algorithms for Fruit Prediction. Journal of Web Development and Web Designing, 4(1), 14-

18. https://zenodo.org/record/2621205#.XoRZtYgzZPY

246

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

Bhavsar, H., & Ganatra, A. (2012). A Comparative Study of Training Algorithms for

Supervised Machine Learning. International Journal of Soft Computing and Engineering

(IJSCE), 2(4), 74-81. http://www.ijsce.org/wp-content/uploads/papers/v2i4/

D0887072412.pdf

Bustamante, J. C., Rodríguez, C., & Esenarro, D. (2019). Real Time Facial Expression

Recognition System Based on Deep Learning. International Journal of Recent

Technology and Engineering (IJRTE), 8(2S11), 4047-4051. https://www.ijrte.org/

wp-content/uploads/papers/v8i2S11/B15910982S1119.pdf

Cao, Y., Fang, X., Ottosson, J., Näslund, E., & Stenberg, E. (2019). A Comparative

Study of Machine Learning Algorithms in Predicting Severe Complications after

Bariatric Surgery. Journal of Clinical Medicine, 8(5), 668. https://doi.org/10.3390/

jcm8050668

Manzoor, S. I., & Singla, J. (2019). A Comparative Analysis of Machine Learning

Techniques for Spam Detection. International Journal of Advanced Trends in Computer

Science and Engineering, 8(3), 810-814. http://www.warse.org/IJATCSE/static/pdf/

le/ijatcse73832019.pdf

Mureșan, H., & Oltean, M. (2018). Fruit recognition from images using deep learning.

Acta Universitatis Sapientiae, Informatica, 10(1), 26-42. https://www.researchgate.net/

publication/321475443_Fruit_recognition_from_images_using_deep_learning

Osisanwo, F. Y., Akinsola, J. E. T., Awodele, O., Hinmikaiye, J. O., Olakanmi, O.,

& Akinjobi, J. (2017). Supervised Machine Learning Algorithms: Classication and

Comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3), 128-

138. https://doi.org/10.14445/22312803/IJCTT-V48P126

R ay, S. (2018). A Comparative Analysis and Testing of Supervised Machine Learning Algorithms.

https://doi.org/10.13140/RG.2.2.16803.60967

Sathya, R., & Abraham, A. (2013). Comparison of Supervised and Unsupervised Learning

Algorithms for Pattern Classication. International Journal of Advanced Research in

Articial Intelligence (IJARAI), 2(2). http://dx.doi.org/10.14569/IJARAI.2013.020206

247

http://doi.org/10.17993/3ctecno.2020.specialissue5.233-247

3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Abril 2020

Sethi, K., Gupta, A., Gupta, G., & Jaiswal, V. (2019). Comparative Analysis of

Machine Learning Algorithms on Dierent Datasets. In Circulation in Computer Science

International Conference on Innovations in Computing (ICIC 2017), 87-91. https://www.

researchgate.net/publication/332223901_Comparative_Analysis_of_Machine_

Learning_Algorithms_on_Dierent_Datasets

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.

(2014). Dropout: A Simple Way to Prevent Neural Networks from Overtting.

Journal of Machine Learning Research, 15(1), 1929-1958. https://www.researchgate.net/

publication/286794765_Dropout_A_Simple_Way_to_Prevent_Neural_Networks_

from_Overtting