Research Article | OPEN ACCESS
Designing A Method for Alcohol Consumption Prediction Based on Clustering and Support Vector Machines
Mendoza-Palechor Fabio, De la Hoz-Manotas Alexis, Morales-Ortega Roberto, Martinez-Palacio Ubaldo, Diaz-Martinez Jorge and Combita-Nino Harold
Department of Computer Science and Electronics, Universidad de la Costa, Barranquilla, Colombia
Research Journal of Applied Sciences, Engineering and Technology 2017 4:146-154
Received: November 21, 2016 | Accepted: February 14, 2017 | Published: April 15, 2017
Abstract
In this study, an implementation of several data mining techniques is presented, including decision trees, Support Vector Machines (SVM), Bayesian Networks and K-Nearest Neighbor and their comparison using different evaluation metrics such as True Positive Rate (TpRate), False Positive Rate (FpRate) and Recall, with the dataset “STUDENT ALCOHOL CONSUMPTION”, that provides information of alcohol consumption in teenagers in Portugal. High alcohol consumption rate in teenagers in society, high schoolers and college students, has become a social problem with alarming data showing they start consuming alcohol between 10 and 14 years and this obviously has a huge impact in their behavior, especially with situations such as binge drinking. At the end of the study, the results found show that Support Vector Machines (SVM) have a better accuracy rate than other techniques used and corroborate that the proposed method it is quite efficient and highly precise for detection of students consuming alcohol, improving the results obtained in previous similar studies.
Keywords:
Alcohol consumption, bayesian networks, data mining, Decision Trees (DT), K-Nearest Neighbors (KNN), Support Vector Machines (SVM),
References
- Jain, A.K., M.N. Murty and P.J. Flynn, 1999. Data clustering: A review. ACM Comput. Surv., 31(3): 264-323.
CrossRef
- Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.
CrossRef PMid:8555380
- Xu, R. and D. Wunsch, 2005. Survey of clustering algorithms. IEEE T. Neural Networ., 16(3): 645-678.
CrossRef PMid:15940994
- Bakhtiarizadeh, M.R., M. Moradi-Shahrbabak, M. Ebrahimi and E. Ebrahimie, 2014. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. J. Theor. Biol., 356: 213-222.
Direct Link
- Bi, J., J. Sun, Y. Wu, H. Tennen and S. Armeli, 2013. A machine learning approach to college drinking prediction and risk factor identification. ACM T. Intell. Syst. Technol., 4(4).
Direct Link
- Cortez, P. and A. Silva, 2008. Using data mining to predict secondary school student performance. In: Brito, A. and J. Teixeira (Eds.), Proceeding of 5th Future Business Technology Conference (FUBUTEC, 2008). Porto, Portugal, April, pp: 5-12.
Direct Link
- Cowell, R.G., P. Dawid, S.L. Lauritzen and D.J. Spiegelhalter, 1999. Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer-Verlag, New York, pp: 324.
Direct Link
- Cristianini, N. and J. Shawe-Taylor, 2000. An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York.
Direct Link
-
Crutzen, R., P.J. Giabbanelli, A. Jander, L. Mercken and H. de Vries, 2015. Identifying binge drinkers based on parenting dimensions and alcohol-specific parenting practices: Building classifiers on adolescent-parent paired data. BMC Public Health, 15(1): 747.
Direct Link
-
Dixon, J.R. and V.M. Pastor, 1970. Introducción a la Probabilidad: Texto Programado. 1st Edn., Editorial Limusa, Mexico, D.F.
-
Edwards, W., 1998. Hailfinder. Tools for and experiences with Bayesian normative modeling. Am. Psychol., 53(4): 416-428.
Direct Link
-
Edwards, W. and B. Fasolo, 2001. Decision technology. Annu. Rev. Psychol., 52(1): 581-606.
Direct Link
-
Fix, E. and J.L. Hodges Jr, 1951. Discriminatory analysis-nonparametric discrimination: Consistency properties. Air Technical Index, California University, Berkeley.
Direct Link
- García, E.G., R.J. López, J.J.M. Moreno, A.S. Abad, B.C. Blasco and A.P. Pol, 2009. La metodología del Data Mining. Una aplicación al consumo de alcohol en adolescentes. Adicciones, 21(1): 65-80.
Direct Link
- Gutiérrez, J.F.M. and L.F.M. Velandia, 2011. Pronóstico de Incumplimientos de Pago Mediante Máquinas de Vectores de Soporte: Una Aproximación Inicial a La Gestión Del Riesgo de crédito. Banco de la República, Bogota´, No. 677.
Direct Link
-
Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. 2nd Edn., Morgan Kaufmann Publishers, San Francisco.
- Harary, F., 1969. Graph Theory. Addison-Wesley Publishing Company Inc., Reading, MA.
Direct Link
-
Hastie, T., R. Tibshirani and J.H. Friedman, 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
- Kecman, V., 2001. Learning and Soft Computing Support Vector Machines, Neural Networks, and Fuzzy Logic Models. MIT Press, Cambridge, Mass.
Direct Link
- Kim, K., 2016. A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recogn., 60: 157-163.
Direct Link
-
Martínez, F., M.C. Díaz, M.T. Martín, V.M. Rivas and L.A. Ureña, 2003. Aplicación de redes neuronales y redes bayesianas en la detección de multipalabras para tareas IR. Proceeding of the Artculo Presentado en las II Jornadas de Tratamiento y Recuperacin de la Informacin, Madrid.
Direct Link
- Mendoza-Palechor, F.E., P.P. Ariza-Colpas, J.A. Sepulveda-Ojeda, A., De-La-Hoz-Manotas and M. Piñeres Melo, 2016. Fertility Analysis Method Based on Supervised and Unsupervised Data Mining Techniques. Int. J. Appl. Eng. Res., 11(21): 10374-10379.
Direct Link
-
Montaño, J.J., E. Gervilla, B. Cajal and A. Palmer, 2014. Data mining classification techniques: An application to tobacco consumption in teenagers. An. Psicol., 30(2): 633-641.
Direct Link
-
Moscovitz, L.J. and P.R. Rengifo, 2010. Al interior de una máquina de soporte vectorial. Rev. Cienc., 14: 73 85.
Direct Link
-
Mucherino, A., P.J. Papajorgji and P.M. Pardalos, 2009. Data Mining in Agriculture. Vol. 34, Springer-Verlag, New York.
Direct Link
- Nadkarni, S. and P.P. Shenoy, 2001. A Bayesian network approach to making inferences in causal maps. Eur. J. Oper. Res., 128(3): 479-498.
Direct Link
- Nadkarni, S. and P.P. Shenoy, 2004. A causal mapping approach to constructing Bayesian networks. Decis. Support Syst., 38(2): 259-281.
Direct Link
- Pagnotta, F. and H.M. Amran, 2016. Using data mining to predict secondary school student alcohol consumption. Department of Computer Science, University of Camerino.
- Pang, R., A. Baretto, H. Kautz and J. Luo, 2015. Monitoring adolescent alcohol use via multimodal analysis in social multimedia. Proceeding of the IEEE International Conference on Big Data (Big Data), pp: 1509-1518.
Direct Link
- Pearl, J., 2001. Bayesian networks, causal inference and knowledge discovery. Technical Report, Computer Science Department, Cognitive Systems Laboratory, University of California, Los Angeles.
- Ríos, S., 1995. Modelización. Alianza, Madrid.
- Rodríguez, J.E.R., E.A.R. Blanco and R.O.F. Camacho, 2013. Clasificación de datos usando el método k-nn. Vínculos, 4(1): 4-18.
-
Ronald, G., 1988. Graph Theory. Benjamin/Cummings Publishing Co., Menlo Park, CA.
- Sánchez, A.S., F.J. Iglesias-Rodríguez, P.R. Fernández and F.J. de Cos Juez, 2016. Applying the K-nearest neighbor technique to the classification of workers according to their risk of suffering musculoskeletal disorders. Int. J. Ind. Ergonom., 52: 92-99.
Direct Link
-
Spirtes, P., C. Glymour and R. Scheines, 2000. Causation, Prediction, and Search. 2nd Edn., MIT Press, Cambridge.
-
Vapnik, V.N. and V. Vapnik, 1998. Statistical Learning Theory. Vol. 1. Wiley, New York.
-
Villalón, M. and C. Cuellar, 2013. Adolescentes y consumo nocivo de alcohol. Chile 2009: Mirando a las políticas públicas. Rev. Méd. Chile, 141(5): 644-651.
Direct Link
- Zhang, M.L. and Z.H. Zhou, 2009. Multi-instance clustering with applications to multi-instance prediction. Appl. Intell., 31(1): 47-68.
Direct Link
-
Zuba, M., J. Gilbert, Y. Wu, J. Bi, H. Tennen and S. Armeli, 2012. 1-norm support vector machine for college drinking risk factor identification. Proceeding of the 2nd ACM SIGHIT International Health Informatics Symposium, pp: 651-660.
Direct Link
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|