ABSTRACT

One of the main reasons for the slow convergence and the suboptimal generalization results of MLP (Multilayer Perceptrons) based on gradient descent training is the lack of a proper initialization of the weights to be adjusted. Even sophisticated learning procedures are not able to compensate for bad initial values of weights, while good initial guess leads to fast convergence and or better generalization capability even with simple gradient-based error minimization techniques. Although initial weight space in MLPs seems so critical there is no study so far of its properties. This paper overviews MLP initialization procedures and experimentally studies such initial weight space properties of MLPs based on different clustering techniques, in various tasks of the known UCl repository of benchmarks, in terms of dividing such weight spaces in homogeneous subspaces regarding speed of convergence and generalization capability.