This thesis presents the application of a minimal radial basis function (RBF) neural network, referred to as MRAN (Minimal Resource Allocation Network) for speaker verification. Extension of MRAN to elliptical basis functions has been studied too.
MRAN is a sequential learning algorithm for radial basis function neural networks. During the training, MRAN allows hidden neurons to be added or removed thus to realize a minimal network. MRAN recruits hidden neurons based on the novelty of the input data. If all of the novelty criteria can not be satisfied, the existing network parameters are updated by extended Kalman filter (EKF). Additionally, MRAN’s pruning strategy removes hidden neurons from the network if their contributed output to the output layer is insignificant. In this way, MRAN is adapted to fit the dynamics of the input data closely.
In this research project, we used MRAN to perform speaker verification tasks and compared the results of MRAN with those of normal Radial Basis Function (RBF) networks, Elliptical Basis Function (EBF) networks. TIMIT corpus is used as the speech database in our experiment. The experiment involves several phases. In the first phase, utterances spoken by the speaker are translated into linear predictive coefficients (i.e., feature vectors). These feature vectors are then fed to the MRAN to train the network. After the network has been trained, its verification decision threshold is then determined. The verification threshold is used to make the decision of accepting or rejecting identity claims based on the network output. In our experiment, we used this procedure to train 76 MRAN networks corresponding to 76 speakers from the TIMIT database. The experimental results showed that MRAN outperforms the conventional RBF network. Compared to elliptical basis function (EBF) neural networks, MRAN produces comparable error rates at a much lower complexity in term of number of network parameters. The computational complexity for each network is analyzed. The computational complexity for all the RBF and EBF networks is , where H represents the total number of hidden neurons in the network and N represents the total number of training data. Compared with RBF and EBF networks, MRAN network requires much lower computational complexity, which is .
We also extended MRAN to elliptical basis functions, referred to as MRAN-EBF. MRAN-EBF is used to perform speaker verification tasks and compared with MRAN. The experimental results show that the performance of MRAN-EBF with full covariance matrix is worse than that of MRAN. MRAN-EBF with diagonal covariance matrix produces comparable error rates with MRAN but with a higher complexity.