A Comprehensive Study on YouTube Spam Comments Recognition using Python, AI and ML

Uppe Nanaji*, C P V N J Mohan Rao**, Vara Prasad K.***
*-*** Department of Computer Science and Engineering, Avanthi Institute of Engineering and Technology, Anakapalle, Andhra Pradesh, India.
Periodicity:January - June'2025

Abstract

The proliferation of spam comments on platforms like YouTube poses a significant challenge, degrading user experience and potentially spreading malicious content or misinformation. This paper provides a comprehensive overview of developing a YouTube spam comment recognition system utilizing Python, Artificial Intelligence (AI), and Machine Learning (ML) techniques. It details the entire pipeline, from data acquisition and preprocessing of textual comment data to feature extraction methodologies suitable for text, selection and training of appropriate ML and AI models, and robust evaluation strategies. The paper also discusses key Python libraries and frameworks instrumental in implementing such systems, alongside an exploration of common challenges like evolving spam tactics and dataset imbalances, and outlines potential future research directions in this domain. The goal is to equip researchers and practitioners with a foundational understanding for building effective automated spam detection systems for online comment sections.

Keywords

Spam Comments, Text Preprocessing, Feature Extraction, Text Classification, Evaluation Strategies, Dataset Imbalance.

How to Cite this Article?

Nanaji, U., Rao, C. P. V. N. J. M., and Prasad, K. V. (2025). A Comprehensive Study on YouTube Spam Comments Recognition using Python, AI and ML. International Journal of Communication and Networking System, 14(1), 36-45.

References

[2]. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media.
[7]. Javed, S., Afzal, H., Arif, F., & Majeed, A. (2016). Reputation management system for fostering trust in collaborative and cohesive disaster management. International Journal of Advanced Computer Science and Applications, 7(7), 347-357.
[9]. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, É. (2011). Scikit- learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.
[10]. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, 242 (1), 29-48.
[12]. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you Need. Advances in Neural Information Processing Systems.
[13]. Zhang, X., Zhao, J., & LeCun, Y. (2015). Character- Level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 35
Online 15 15 200 35
Pdf & Online 35 35 400 35

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.