OntoEnricher : A Deep Learning Approach for Ontology Enrichment from Unstructured Text

doi:10.1201/9781003155799-9

ABSTRACT

Information Security in the cyber world is a major concern, with significant increase in attack surfaces. Existing information on vulnerabilities, attacks, controls, and advisories available on web provides an opportunity to represent knowledge and perform security analytics to mitigate some of the concerns. Representing security knowledge in the form of ontology facilitates anomaly detection, threat intelligence, reasoning and relevance attribution of attacks, and many more. This necessitates dynamic and automated enrichment of information security ontologies. However, existing ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases and sentences. This motivates the need for sequential Deep Learning architectures that traverse through dependency paths in text and extract embedded security related concepts and instances from learned path representations. In the proposed approach, Bidirectional LSTMs trained on a large DBpedia dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence Encoder is deployed to enrich ISO 27001 based information security ontology. The model trained and tested on high performance computing (HPC) environment to handle Wiki text dimensionality yielded a test accuracy of over 80\% when tested with knocked out concepts from ontology and web page instances to validate the robustness.