i-manager Publications

A Comprehensive Study on YouTube Spam Comments Recognition using Python, AI and ML

Uppe Nanaji*, C P V N J Mohan Rao**, Vara Prasad K.***

*-*** Department of Computer Science and Engineering, Avanthi Institute of Engineering and Technology, Anakapalle, Andhra Pradesh, India.

Periodicity:January - June'2025

Abstract

The proliferation of spam comments on platforms like YouTube poses a significant challenge, degrading user experience and potentially spreading malicious content or misinformation. This paper provides a comprehensive overview of developing a YouTube spam comment recognition system utilizing Python, Artificial Intelligence (AI), and Machine Learning (ML) techniques. It details the entire pipeline, from data acquisition and preprocessing of textual comment data to feature extraction methodologies suitable for text, selection and training of appropriate ML and AI models, and robust evaluation strategies. The paper also discusses key Python libraries and frameworks instrumental in implementing such systems, alongside an exploration of common challenges like evolving spam tactics and dataset imbalances, and outlines potential future research directions in this domain. The goal is to equip researchers and practitioners with a foundational understanding for building effective automated spam detection systems for online comment sections.

Keywords

Spam Comments, Text Preprocessing, Feature Extraction, Text Classification, Evaluation Strategies, Dataset Imbalance.

How to Cite this Article?

Nanaji, U., Rao, C. P. V. N. J. M., and Prasad, K. V. (2025). A Comprehensive Study on YouTube Spam Comments Recognition using Python, AI and ML. International Journal of Communication and Networking System, 14(1), 36-45.

References

[1]. Agarwal, R., Dhoot, A., Kant, S., Bisht, V. S., Malik, H., Ansari, M. F., & Hossaini, M. A. (2024). A novel approach for spam detection using natural language processing with AMALS models. IEEE Access, 12, 124298-124313.

[2]. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Media.

[3]. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

[4]. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171-4186).

[5]. Ebrahimi, J., Rao, A., Lowd, D., & Dou, D. (2017). Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751.

[6]. Hochreiter, S., & Schmidhuber, J. (1997). Long short- term memory. Neural Computation, 9(8), 1735-1780.

[7]. Javed, S., Afzal, H., Arif, F., & Majeed, A. (2016). Reputation management system for fostering trust in collaborative and cohesive disaster management. International Journal of Advanced Computer Science and Applications, 7(7), 347-357.

[8]. O'Callaghan, D., Harrigan, M., Carthy, J., & Cunningham, P. (2012). Network analysis of recurring youtube spam campaigns. In Proceedings of the International AAAI Conference on Web and Social Media, 6 (1), 531-534.

[9]. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, É. (2011). Scikit- learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825-2830.

[10]. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, 242 (1), 29-48.

[11]. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).

[12]. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you Need. Advances in Neural Information Processing Systems.

[13]. Zhang, X., Zhao, J., & LeCun, Y. (2015). Character- Level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems.

A Comprehensive Study on YouTube Spam Comments Recognition using Python, AI and ML

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content:

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	35
Online	15	15	200	35
Pdf & Online	35	35	400	35