Vol. 15 No. 2 (2024)

Towards a semi-automatic classifier of malware through tweets for early warning threat detection

Claudia Lanza
University of Calabria
Lorenzo Lodi
Zanasi & Partners

Published 2024-05-15


  • Malware,
  • Classification,
  • NLP,
  • Twitter,
  • Text Mining.

How to Cite

Lanza, Claudia, and Lorenzo Lodi. 2024. “Towards a Semi-Automatic Classifier of Malware through Tweets for Early Warning Threat Detection”. JLIS.It 15 (2):101-18. https://doi.org/10.36253/jlis.it-591.

Funding data

  • Ministero dell'Università e della Ricerca
    Grant numbers PON "Ricerca e Innovazione" 2014-2020 Asse IV, Azione IV.4, Azione IV.6, avviso DM 1062 del 10.08.2021, RTD-A a regime di tempo pieno,  codice identificativo 1062_R10_INNOVAZIONE, settore concorsuale 11/A4, settore scientifico disciplinare M-STO/08.


This paper presents a method for developing a malware ontology structure by detecting malware instances on Twitter. The ontology represents a semi-automatic classifier fed by the data extracted from tweets. In particular, the automatic part of the presented methodology relies on a pattern-based approach to detect trigger expressions leading to new information about malware, whilst the manual one covers the evaluation of the results by domain-experts, who also validate the reliability of the semantic relationships within the ontology framework. We present preliminary results on the application of our methodology to tweets extracted from MalwareBazaar database showing how the documents’ collection analysis, through Natural Language Processing (NLP) tasks, can support the knowledge retrieval and documents’ classification procedures for building early warning system of detected malware. Results obtained from this research paper within the time framework of 2023 are referred to the previous version of the current social network X.


