ABSTRACT

Accident reports provide a valuable source of data for any safety management system. In multi-lingual jurisdictions, accident reports can be provided in more than one language. For example the Swiss transport authority collects accident reports that are written in either German, French, or Italian. The unstructured nature of free-text makes it difficult to extract information from large numbers of accident reports. Machine-reading of text is an emerging area of research, however there are few instances of information being extracted from text in more than one language.

This paper introduces an ontology-based interactive learning method between a human and computer software to identify safety-related information by analysing text written in three different languages. The results of the method were analysed by fluent speakers of each language, who rated the overall accuracy of the method to be 98.5%.

The method stores and processes the data in a NoSQL graph database, which provides a powerful tool to readily integrate the analysis with other data sources, for example train movement data, passenger census data, or even comparative data from other railways.