ABSTRACT

We investigate the effectiveness of a new gradient descent method for Conditional Maximum Likelihood (CML) and Maximum Likelihood (ML) training of Hidden Markov Models (HMMs), which significantly outperforms traditional gradient descent. Instead of using a fixed learning rate for every adjustable parameter of the HMM, we propose the use of independent learning rate adaptation, a strategy that has been proved valuable in Artificial Neural Networks training. We show that this approach, compared to standard gradient descent, performs significantly better. The convergence speed is increased up to five times, while at the same time the training procedure becomes more robust, as tested on applications from molecular biology. We also show that, if the labels of the HMM are well defined, CML training performs better than ML; using this approach, we may obtain better results without any additional computational complexity or the need for parameter tuning.