83
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
MACHINE LEARNING MODEL TO PREDICT THE
DIVORCE OF A MARRIED COUPLE
Nahum Flores
Student, Articial Intelligence Group, Faculty in Systems Engineering and Computer Science,
National University of San Marcos. Lima, (Peru).
E-mail: nahum.ores@unmsm.edu.pe
ORCID: https://orcid.org/0000-0002-5807-4323
Sandra Silva
Student, Articial Intelligence Group, Faculty in Systems Engineering and Computer Science,
National University of San Marcos. Lima, (Peru).
E-mail: sandra.silva3@unmsm.edu.pe
ORCID: https://orcid.org/0000-0003-0253-7883
Recepción:
09/12/2020
Aceptación:
10/03/2021
Publicación:
07/05/2021
Citación sugerida:
Flores, N., y Silva, S. (2021). Machine learning model to predict the divorce of a married couple.
3C Tecnología. Glosas de innovación aplicadas a la pyme, Edición Especial, (mayo 2021), 83-95. https://doi.
org/10.17993/3ctecno.2021.specialissue7.83-95
84
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
ABSTRACT
Divorce usually impacts the closest family members, over the years the divorce rate has
increased dramatically, especially in the last two decades and worsening with the pandemic,
where there has been a signicant increase in the divorce rate in many countries of the
world. We draw on Yöntem's work where he poses 56 questions as predictors of divorce.
In addition, we make use of 4 automatic learning models (perceptron, logistic regression,
neural networks and randomized forest) and 3 hybrid models based on voting criteria. Each
of these models was trained in 5 dierent scenarios, making a total of 35 experiments,
the best performance obtained in terms of precision, sensitivity and specicity is 0.9853,
1.0 and 0.9667 respectively, corresponding to the perceptron model and a hybrid model;
however, although the results show a high performance, the context, the amount of data
and the country in which the data were collected must be considered.
KEYWORDS
Machine learning, Neural networks, Divorce predict, Voting.
85
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
1. INTRODUCTION
The divorce rate worldwide has increased dramatically in recent years. This assertion is
based in gures. First, is the American divorce rate. If we compare the gures for 2018 with
those of 1900, it shows that there are four times more divorced women (Schweizer, 2020).
In Spain, this rate doubled (2018) compared to 2000 (INE, 2018). In Mexico, such rate
tripled from 2000 to 2019 (INEGI, 2019) and in Peru the number of divorces registered in
2018 (INEI, 2018) is eight times higher than those registered in 2000 (INEI, 2010).
The current pandemic context has only exacerbated this phenomenon, as the connement
has brought with its greater increase. This applies for the United States, which, in couples
with at least 5 years of marriage, registered 16% more divorces in the third quarter of 2020
than in the same period of 2019 and an increase of 5% in couples with children that have
less than 18 years (Legal Templates, 2020). An even more noticeable case occurs in Xi’an
(China) where divorce requests have increased to such an extent that they have reached
their daily limit (Díez, 2020). Naturally, this situation has consequences that can aect close
members of the families involved (Sánchez, 2019).
In this regard, dierent studies have identied multiple factors to predict divorce. One of
the most signicant works was that of Gottman. He identied “The Four Horsemen of the
Apocalypse" that can end a marriage: criticism, contempt, stonewalling and defensiveness
(Gottman & Silver, 2014). Using just these four variables in a longitudinal study conducted
with newlywed couples, Gottman estimated which couples would have an early divorce with
85% accuracy. Gottman also identied that quality sexual satisfaction, love, and passion in
marriages depend directly (by 70%) on the quality of friendship they have (Gottman &
Silver, 2015). On the other hand, there are studies that show indelity as the main ground
for divorce. This is not surprising: indelity is the leading cause of divorce in the United
States (Mark, Janssen, & Milhausen, 2011) as well as in more than 160 cultures (Betzig,
1989), because it has negative eects on the relationship, and can be the most feared and
devastating experience in a matrimony (Pittman, 1994), thus leading it to an end (Zordan
& Strey, 2011).
In the last decade, the use of Machine Learning models in psychology has become popular,
leaving behind numerous methods of estimation, statistical analysis and data mining for
86
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
predictions. In the rst place is Amiriparian, who used audio spectrograms to diagnose
bipolar disorder (Amiriparian et al., 2019), obtaining an Unweighted Average Recall (UAR)
of 46.2%. Second are Eastwick and Joel, who used a random forest method to predict
the appereance of a relationship based on traits and preferences; out of 192 couples, it
was able to predict 4% to 18% of actor variance (average tendency to romantically desire
other people) and 7% to 27% of the partner variance (to be desired by other people) (Joel,
Eastwick, & Finkel, 2017). Finally, Flesia's work predicted the stress levels that are caused by
COVID-19 from 18 psychosocial variables, achieving a sensitivity of 76%.
For the particular case of the prediction of divorce, we checked the following background:
the work of Großmann et al. (2019), which used a linear regression model to predict the
future of a relationship based on the analysis of personality traits, the work of Yöntem et al.
(2019) with ANN models that achieved a precision of 98.85% and the work of Narendran,
Abilash & Charulatha (2020) that made use of a voting classier with decision trees, bagging
classier and XGBoost prediction models, achieving a performance of 94.14%.
Using fresher classication methods, the present work aims to compare the high performance
obtained with an analysis based on the correlation of variables, making available the
proposed models and their respective trained results.
2. MATERIALS AND METHODS
2.1. DATASET
In this research we will use the same dataset as the one in Yöntem's et al. work (2019) which
is composed of 54 questions. 6 of them can be seen in Table 1: they were answered by 170
people –84 divorced and 86 married–. As divorce predictor, each question had dierent
probabilities of impact. Answers are on a 5-point scale (0 = Never, 1 = Rarely, 2 = Average,
3 = Often, 4 = Always).
Table 1. Questions formulated in Yöntem's work.
ID Questions
Atr1
If one of us apologizes when our discussion deteriorates,
the discussion ends.
Atr2
I know we can ignore our differences, even if things get
hard sometimes.
87
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Atr3
When we need it, we can take our discussions with my
spouse from the beginning and correct it.
Atr52
I wouldn't hesitate to tell my spouse about her/his
inadequacy.
Atr53 When I discuss, I remind my spouse of her/his inadequacy.
Atr54 I'm not afraid to tell my spouse about her/his incompetence.
Source: adapted from (Yöntem et al., 2019).
2.2. PREPROCESSING
Data normalization is one of the preprocessing approaches where the data is scaled or
transformed to obtain an equal contribution from each characteristic, thus translating into
a signicant improvement in the performance of Machine Learning algorithms (Singh &
Singh, 2019). In this work, the 54 questions contain numerical data between 0 to 4, values
that were re-scaled between -1 and 1, as shown in Figure 1.
Figure 1. Normalization of the answers.
Source: own elaboration.
2.3. FEATURE SELECTION
Considering each question as a characteristic, we use Pearson’s correlations for the selection.
Thus, we measure the degree of relationship between the variables (Liu et al., 2020). Table
2 shows the 20 variables with the highest correlation.
Table 2. Question with the highest correlation.
Id Score Id Score
Atr22 0.7853 Atr42 0.6423
Atr54 0.7685 Atr48 0.6336
Atr28 0.7621 Atr53 0.6114
Atr44 0.7530 Atr47 0.5827
Atr34 0.7498 Atr52 0.5755
Atr32 0.7397 Atr45 0.5102
Atr50 0.7254 Atr43 0.4822
Atr31 0.6992 Atr7 0.4280
88
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Atr51 0.6841 Atr46 0.4003
Atr49 0.6748 Atr6 0.2871
Source: own elaboration.
2.4. CLASSIFICATION
For the classication, this work uses four models of Machine Learning. The rst is the
Perceptron model, with a stop criterion of 1e-4. The second model is a logistic regression
with lbfgs as the optimization parameter. The third model are neural networks, composed
of 7 layers, as seen in Figure 2, all with a sigmoidal activation function and 30 epochs for
their training. The fourth is a Random Forest model with 100 estimators and a depth of 2.
Generally, hybrid models based on voting criteria have superior performance (Kuncheva
& Rodríguez, 2012; Liu, Reviriego, Lombardi, & Hernandez, 2020), for which 3 hybrid
models were created from the 4 models mentioned. The classication models can be seen
in Table 4.
Figure 2. Architecture of a neural network model.
Source: own elaboration.
Figure 3. Training scheme for each model.
Source: own elaboration.
For the training, test and training data was randomly divided, in the proportions shown in
Table 3. Each model was trained with the scheme in Figure 3.
Table 3. Proportion of training and test data.
89
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Label Proportion (%)
50/50 60/40 70/30 80/20 90/10
Divorced Training 42 50 59 67 76
Test 42 34 25 17 8
Married Training 43 52 60 69 77
Test 43 34 26 17 9
Source: own elaboration.
Table 4. Models used for prediction.
ID Models
M1 Logistic Regression
M2 Neural Networks
M3 Random Forest
M4 Perceptron, Logistic Regression and Neural Networks
H1 Perceptron, Neural Networks and Random Forest
H2 Perceptron, Logistic Regression and Random Forest
H3 Perceptron
Source: own elaboration.
Figure 4. Voting criteria for hybrid models.
Source: own elaboration.
The proposed model was implemented using Python 3 in Google Colab (Carneiro,
Medeiros, & Nepomuceno, 2018) using a 2.3 GHz Xeon CPU with 13gb RAM and a 16gb
RAM Nvidia Tesla V100 graphics card.
90
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
2.5. EVALUATION
To measure the performance of the classication, the proposed model used performance
metrics in terms of sensitivity (Sen), specicity (Spe) and Accuracy (Acc).
A divorced person properly classied is called “true positive” (TP). A divorced person that
is not properly classied is called “true negative” (TN). When a divorced person is classied
as married, it is called a “false negative” (FN), and when a married person is classied as
divorced, it is called a “false positive” (FP).
Sensitivity shows divorced people correctly classied, dened as (Lyusin & Ovsyannikova,
2016):
(1)
Specicity shows divorced and married people properly classied. It is calculated as follows
(Glaros & Kline, 1988):
(2)
Accuracy indicates the ratio of correctly classied people, obtained with the formula
(Pedersen, Cheng, & Rasmussen, 1989):
(3)
On the other hand, hybrid models are evaluated by the voting criterion (see Figure 4),
where the label that was repeated the most is selected.
3. RESULTS
In this work, multiple experiments were generated with the four models dened in the
“classication” section. These were trained with the proportions dened in Table 3. When
training the model with the Yöntem work dataset, the results of Table 5 are obtained.
Table 5. Accuracy results of the training.
Training/Test (%)
Model 50/50 60/40 70/30 80/20 90/10
Perceptron 0.9529 0.9853 0.9608 0.9412 0.9412
Logistic Regression 0.9412 0.9559 0.9804 0.9706 0.9412
91
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Neural Networks 0.9647 0.9559 0.9804 0.9706 0.9412
Random Forest 0.9294 0.9412 0.9804 0.9706 0.9412
H1* 0.9647 0.9853 0.9804 0.9706 0.9412
H2* 0.9412 0.9706 0.9804 0.9706 0.9412
H3* 0.9412 0.9706 0.9804 0.9706 0.9412
Source: own elaboration.
Table 6. Sensitivity results of the training.
Training/Test (%)
Model 50/50 60/40 70/30 80/20 90/10
Perceptron 0.9783 1.0000 0.9630 0.9444 1.0000
Logistic Regression 1.0000 1.0000 1.0000 1.0000 1.0000
Neural Networks 1.0000 1.0000 1.0000 1.0000 1.0000
Random Forest 1.0000 1.0000 1.0000 1.0000 1.0000
H1* 1.0000 1.0000 1.0000 1.0000 1.0000
H2* 1.0000 1.0000 1.0000 1.0000 1.0000
H3* 1.0000 1.0000 1.0000 1.0000 1.0000
Source: own elaboration.
Table 7. Sensitivity results of the training.
Training/Test (%)
Model 50/50 60/40 70/30 80/20 90/10
Perceptron 0.9231 0.9667 0.9583 0.9375 0.9000
Logistic Regression 0.8718 0.9000 0.9583 0.9375 0.9000
Neural Networks 0.9231 0.9667 0.9583 0.9375 0.9000
Random Forest 0.8462 0.8667 0.9583 0.9375 0.9000
H1* 0.9231 0.9667 0.9583 0.9375 0.9000
H2* 0.8718 0.9333 0.9583 0.9375 0.9000
H3* 0.8718 0.9333 0.9583 0.9375 0.9000
Source: own elaboration.
4. CONCLUSIONS
In this work, 7 models were used for the prediction of divorce, trained with the dataset from
Yöntem’s et al. (2019) work and the dataset collected in this research. Each of these models
was trained in 5 dierent scenarios, making a total of 35 experiments. Among these, the
best results were obtained with the perceptron model and the rst hybrid model; however,
due to the amount of data, the hybrid models did not perform better.
92
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Although the results show high performance, the context, the amount of data and the
country in which the data was collected must be considered. In order to feed the dataset to
retrain the models, in the future we plan to collect couple’s data from dierent countries,
evaluating their performance.
Divorce is a major problem, especially in a context of connement, where the rates of
divorced couples have increased considerably, indirectly aecting the closest members of
the family (such as children). Also, couples can lose a lot by going through a divorce process.
This study can help them prevent these consequences. The prediction models in this study
would help people decide whether to make the decision to marry or not, give them the
opportunity based on compatibility to have a successful marriage.
The best performance of a model was obtained by using the 60/40 ratio of the training
and test data. The results were 0.9853 precision, 1.0 sensitivity and 0.9667 specicity. We
make the models and training results available in our GitHub repository (https://github.
com/NahumFGz/DivorcePredict).
REFERENCES
Amiriparian, S., Awad, A., Gerczuk, M., Stappen, L., Baird, A., Ottl, S., &
Schuller, B. (2019). Audio-based Recognition of Bipolar Disorder Utilising Capsule
Networks. IEEE Xplore. https://doi.org/10.1109/ijcnn.2019.8852330
Betzig, L. (1989). Causes of Conjugal Dissolution: A Cross-cultural Study. Current
Anthropology, 654-676.
Carneiro, T., Medeiros, R., & Nepomuceno, T. (2018). Performance Analysis of
Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE,
9. https://doi.org/10.1109/ACCESS.2018.2874767
Díez, P. M. (2020, March 21). Divorce epidemic in China due to coronavirus quarantines. ABC.
https://www.abc.es/sociedad/abci-epidemia-divorcios-china-cuarentenas-
coronavirus-202003200152_noticia.html
93
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Glaros, A., & Kline, R. (1988). Understanding the accuracy of tests with cutting scores:
The sensitivity, specicity, and predictive value model. Journal of Clinical Psychology,
44(6), 1013-1023. https://doi.org/10.1002/1097-4679(198811)44:6<1013::aid-
jclp2270440627>3.0.co;2-z
Gottman, J., & Silver, N. (2014). How to Maintain Love. Secrets from the Love Lab. Varlık
Publications.
Gottman, J., & Silver, N. (2015). Seven Principles of Keeping Married. Varlık Publications.
Großmann, I., Hottung, A., & Krohn-Grimberghe, A. (2019). Machine learning meets
partner matching: Predicting the future relationship quality based on personality
traits. PLOS ONE, 14(3).
INE. (2018, September 30). Eurostat Statistics Eplained. https://www.ine.es/prodyser/espa_
cifras/2018/14/
INEGI. (2019). INEGI. Instituto Nacional de Estadística y Geografía. https://www.inegi.org.mx/
temas/nupcialidad/
INEI. (2010). National Institute of Statistics and Informatics. https://www.inei.gob.pe/media/
MenuRecursivo/publicaciones_digitales/Est/Lib1045/cap04.pdf
INEI. (2018). National Institute of Statistics and Informatics. https://www.inei.gob.pe/media/
MenuRecursivo/publicaciones_digitales/Est/Lib1698/libro.pdf
Joel, S., Eastwick, P., & Finkel, E. (2017). Is Romantic Desire Predictable? Machine
Learning Applied to Initial Romantic Attraction. Psychological Science, 28(10), 1478-
1489.
Kuncheva, L., & Rodríguez, J. (2012). A weighted voting framework for classiers
ensembles. Knowledge and Information Systems, 38(2), 259-275. https://doi.org/10.1007/
s10115-012-0586-6
Legal Templates. (2020, July 29). US Divorce Rates Soar During COVID-19 Crisis. https://
legaltemplates.net/resources/personal-family/divorce-rates-covid-19/#divorces-
increase-in-couples-with-children
94
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Liu, S., Reviriego, P., Lombardi, F., & Hernandez, J. A. (2020). Voting Margin: A
Scheme for Error-Tolerant k Nearest Neighbors Classiers for Machine Learning.
IEEE Transactions on Emerging Topics in Computing. https://doi.org/10.1109/
tetc.2019.2963268
Liu, Y., Mu, Y., Chen, K., Li, Y., & Guo, J. (2020). Daily Activity Feature Selection
in Smart Homes Based on Pearson Correlation Coecient. Neural Processing Letters,
2(1771-1787), 51. https://doi.org/10.1007/s11063-019-10185-8
Lyusin, D., & Ovsyannikova, V. (2016). Measuring two aspects of emotion recognition
ability: Accuracy vs. sensitivity. Learning and Individual Dierences, 52, 129-136. https://
doi.org/10.1016/j.lindif.2015.04.010
Mark, K. P., Janssen, E., & Milhausen, R. R. (2011). Indelity in Heterosexual Couples:
Demographic, Interpersonal, and Personality-Related Predictors of Extradyadic
Sex. Archives of Sexual Behavior, 40(5), 971-982. https://doi.org/10.1007/s10508-011-
9771-z
Narendran, D. J., Abilash, R., & Charulatha, B. S. (2020). Exploration of Classication
Algorithms for Divorce Prediction. Proceedings of International Conference on Recent
Trends in Machine Learning, IoT, Smart Cities and Applications, 291-303. https://doi.
org/10.1007/978-981-15-7234-0_25
Pedersen, P., Cheng, G., & Rasmussen, J. (1989). On Accuracy Problems for Semi-
Analytical Sensitivity Analyses. Mechanics of Structures and Machines, 17(3), 373-384.
https://doi.org/10.1080/089054508915647
Pittman, F. (1994). Private Lies: Indelity and the Betrayal of Intimacy. Artes Médicas.
Sánchez, T. (2019). Consequences of divorce on children. The need for a new way of intervening: The
joint work of a lawyer and a psychologist. https://eprints.ucm.es/54965/
Schweizer, V. (2020). Divorce: More than a Century of Change, 1900-2018. Family Prol,
1-2. https://www.bgsu.edu/content/dam/BGSU/college-of-arts-and-sciences/
NCFMR/documents/FP/schweizer-divorce-century-change-1900-2018-fp-20-22.
pdf
95
https://doi.org/10.17993/3ctecno.2021.specialissue7.83-95
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Edición Especial Special Issue Mayo 2021
Singh, D., & Singh, B. (2019). Investigating the impact of data normalization on
classication performance. Applied Soft Computing, 105524. https://doi.org/10.1016/j.
asoc.2019.105524
Yöntem, M., Adem, K., İlhan, T., & Kılıçarslan, S. (2019). Divorce Prediction Using
Correlation Based Feature Selection And Articial Neural Networks. Nevşehir Hacı
Bektaş Veli University SBE Dergisi, 9(1), 259-273. https://dergipark.org.tr/tr/pub/
nevsosbilen/issue/46568/549416
Zordan, E. P., & Strey, M. N. (2011). Marital separation: Aspects involved in this decision, reversion and
future projects. https://www.semanticscholar.org/paper/Separa%C3%A7%C3%A3o-
conjugal%3A-aspectos-implicados-nessa-e-Zordan-Strey/6e25b5932d14c86e3aa3f
191bb9761876e76eb9c