ABSTRACT

Effectively combining multiple (and complementary) sources of information is becoming one of the most promising paths for increased accuracy and more detailed analysis in numerous applications. Neuroscience, business analytics, military intelligence, and sociology are among the areas that could significantly benefit from properly processing diverse data sources. However, traditional methods for combining multiple sources of information are based on slow or impractical methods that rely either on vast amounts of manual processing or on suboptimal representations of data. We introduce an analytical framework that allows automatic and efficient processing of both hard (e.g., physics-based sensors) and soft (e.g., human-generated) information, leading to enhanced decision-making in multi-

source environments. This framework combines Natural Language Processing (NLP) methods for extracting information from soft data sources and the DempsterShafer (DS) Theory of Evidence as the common language for data representation and inference. The steps in the NLP module consist of part-of-speech tagging, dependency parsing, coreference resolution, and a conversion to semantics based on first order logic representations. Compared to other methods for handling uncertainties, DS theory provides an environment that is better suited for capturing data models and imperfections that are common in soft data. We take advantage of the fact that computational complexity typically associated with DS-based methods is continually decreasing with both the availability of better processing systems, as well as with improved processing algorithms such as conditional approach to evidence updating/fusion. With an adequate environment for numerical modeling and processing, two additional elements become especially relevant, namely: (1) assessing source credibility, and (2) extracting meaning from available data. Regarding (1), it is clear that the lack of source credibility estimation (especially with human-generated information) could direct even the most powerful inference methods to the wrong conclusions. To address this issue we present consensus algorithms that mutually constrain the data provided by each of the sources to assess their individual credibility. This process can be reinforced to get improved results by incorporating (possibly partial) information from physical sensors to validate soft data. At the end of a credibility estimation process, every piece of information can be properly scaled prior to any inference process. Then, meaning extraction (i.e., (2)) becomes possible by applying the desired inference method. Special consideration must be taken to ensure that the selected inference method preserves the quality and accuracy of the original data as much as possible, as well as the relations among different sources of information and among data. To accomplish this, we propose using first-order logic (FOL) in the DS theoretic framework. Under this approach, soft information (in the form of natural language) is analyzed syntactically and for discourse structure, and consequently converted into FOL statements representing the semantics. Processing of these statements through an “uncertain logic” DS methodology renders bodies of evidence (BoE) that, combined with experts’ opinions stored in knowledge bases, can be fused to provide accurate solutions to a wide variety of queries. Examples of queries include finding or refining groups of suspects in a crime scene, validating credibility of witnesses, and categorizing data in the web. When hard-sensor data is also incorporated in the inference process, challenging applications such as multi-source detection, tracking, and intent detection, could also be addressed with the proposed solution.