How We Cut Prediction Error in Half By Using A Different Training Method

doi:10.4324/9781315784076-41

ABSTRACT

Neuroidentification – the effort to train neural nets to predict or simulate dynamical systems over time – is a key research priority, because it is crucial to efforts to design brain-like intelligent systems [1, 2, 3]. Unfortunately, most people treat this task as a simple use of supervised learning [4]: they build networks which take all of their input from a fixed set of observed variables from a fixed window of time before the prediction target. They adapt the weights in the net so as to make the outputs of the net match the prediction target for those fixed inputs, exactly as they would do in any static mapping problem.

With McAvoy and Su, I have compared the long-term prediction errors which result from this procedure versus the errors which result from using a radically different training procedure – the pure robust method – to train exactly the same simple feedforward network, with the same inputs and targets. The reduction in average prediction error was 60%, across 11 predicted variables taken from 4 real-world chemical processes. More importantly, error was reduced for all variables, and reduced by a factor of 3 or more for 4 out of the 11 variables [5, p.319]. Followup work by Su[6, p.92] studied 5 more chemical processes (mostly proprietary to major manufacturers), and found that the conventional procedure simply “failed” (relative to the pure robust procedure) in 3 out of 5. This paper describes how we did it; it also tries to correct common misconceptions about recurrent networks, and summarize future research needs.