A comparative analysis of recurrent neural network and support vector machine for binary classification of spam short message service

David Odera 1, * and Gloria Odiaga 2

1 Tom Mboya University, Homabay-Kenya.
2 Jaramogi Oginga Odinga University of Science and Technology, Bondo-Kenya.
 
Research Article
World Journal of Advanced Engineering Technology and Sciences, 2023, 09(01), 127–152.
Article DOI: 10.30574/wjaets.2023.9.1.0142
Publication history: 
Received on 09 April 2023; revised on 26 May 2023; accepted on 28 May 2023
 
Abstract: 
Over the years, communication through Short Message Service (SMS) has been a primary tool for mobile subscribers. SMS has varied applications in health, industry, finances, education and social networking among others. The growth of mobile devices and SMS usage has consequently increased the attack surface for cyber-criminals culminating to the proliferation of malicious activities introduced using SMS spam, phishing, spyware, malware etc. Ham messages are normal messages people trade with one another and are usually not unwanted by the recipient, while spam messages are unsolicited junk and redundant messages that may be sent to a large number of people at once and are usually unwanted. Various spam detection models have been developed using various traditional machine and deep learning techniques. However, most studies where comparison between deep and traditional machine learning algorithms is done, have unfortunately omitted K-Nearest Neighbors and Support Vector Machine (SVM) which are empirically deemed as the most popular traditional machine learning algorithms. In this study, therefore, we develop a deep learning model based on Recurrent Neural Network (RNN) for Spam and Ham SMS classification and compare its performance against SVM model for the same University of California (UCl) SMS dataset. The results show that RNN has a slightly higher training and validation accuracy of 0.98 compared to SVM at 0.94, however, the false positive rate of SVM is marginally lower. Exploring application of deep learning with better optimization algorithms such as RNN improves accuracy, reduces computational complexities i.e. memory consumption and speed, and thus minimizing false positive rates. For future work, we suggest the use of varied performance metrics to validate the model in a distributed dataset environment.
 
Keywords: 
Spam; Ham; Recurrent Neural Networks; Support Vector Machine; Comparison; Short Message Service
 
Full text article in PDF: