ABSTRACT

The ability of humans to reliably perceive and recognise objects relies on an interaction between information seen in the visual image and prior expectations. We describe an extension to the CHREST computational model which enables it to learn and combine information from multiple input modalities. Simulations demonstrate the presence of quantitative effects on recognition ability due to cross-modal interactions. Our simulations with CHREST illustrate how expectations can improve classification accuracy, reduce classification time, and enable words to be reconstructed from noisy visual input.