2026-06-10T11:36:43.565Z http://ezaposleni.singidunum.ac.rs/rest/sciNaucniRezultati/oai

ezaposleni.singidunum.ac.rs/rest/sciNaucniRezultati/oai:2:11924 2026-05-17T22:59:37Z 2

Comparative evaluation of natural language processing approaches with particle swarm optimized LightGBM for anomaly detection in cloud system logs 2026 http://ezaposleni.singidunum.ac.rs/rest/sciNaucniRezultati/oai/record/2/11924 https://www.nature.com/articles/s41598-026-51889-x S. Djukanovic S. Stojanovic B. Nikolic S. Anetic T. Zivkovic M. Zivkovic V. Simic N. Bacanin Effective log assessment is a cornerstone for ensuring the reliability and smooth functioning of large-scale, cloud-based infrastructures. As these environments continuously grow in scope and intricacy, traditional monitoring approaches often struggle to keep up, leading to unnecessary computational strain and diminished system performance. To address these challenges, this research introduces a hybrid framework that integrates natural language processing (NLP) with an optimized LightGBM classifier, tailored for anomaly detection in cloud-generated system logs. Central to the design is a thorough preprocessing pipeline that filters irrelevant information and minimizes bias, thereby sharpening anomaly detection accuracy. The experimental setup explored multiple NLP-based preprocessing strategies, including TF-IDF, BERT, and Word2Vec implementations (employing spaCy and Gensim). Classification relied on the LightGBM model, whose hyperparameters were refined using a customized particle swarm optimization (PSO) metaheuristic. This modified optimization routine boosted both predictive accuracy and model robustness. Results from the study reveal that the combined system significantly enhanced both the identification and classification of anomalies in cloud logs. The most effective configuration achieved up to 100% accuracy under the evaluated experimental conditions, highlighting the framework’s potential to strengthen security within cloud ecosystems. To ensure statistical soundness, extensive comparative evaluations were conducted across all configurations. Additionally, model interpretability was improved through SHapley Additive exPlanations (SHAP), which provided transparent insights into the role of individual features in shaping classification outcomes. article 10.1038/s41598-026-51889-x 1 54 2045-2322 Scientific Reports