Matching Targets for
Automatic Speech Recognition
Automatic speech recognition usually targets optimum recognition performance on word level. This requires that the decision rule and the training approach should support this target, and be consistent. Nevertheless, in practice these requirements are not met completely, since, upon others,
- the standard decision rule minimizes sentence/segment error, instead of word error,
- the training criterion and the decision rule often are inconsistent,
- the recognition vocabulary is limited and therefore incomplete.
In this presentation, we will investigate, why the standard decision rule, in spite of its sentence/segment-based target, often implicitly also minimizes word error rate. We will also consider, how the choice of training criterion can support the use of the standard decision rule while retaining the target of minimum word error rate. Finally, we will discuss, how the out-of-vocabulary problem can be solved without blowing up the vocabulary, by incorporating character recognition into a standard word-based recognizer.
Ralf Schlueter studied physics at RWTH Aachen University, Germany, and Edinburgh University, UK. He received the Diplom degree with honors in physics in 1995 and the Dr.rer.nat. degree with honors in computer science in 2000, from RWTH Aachen University. From November 1995 to April 1996, Ralf Schlueter was with the Institute for Theoretical Physics B at RWTH Aachen, where he worked on statistical physics and stochastic simulation techniques. Since May 1996, Ralf Schlueter has been with the Computer Science Department at RWTH Aachen University, where he currently is senior researcher and leads the automatic speech recognition group at the Human Language Technology and Pattern Recognition institute. His research interests cover speech recognition, discriminative training, decision theory, stochastic modeling, and signal analysis.