The repeated use of out-of-vocabulary (OOV) words in a spo-
ken document seriously degrades a speech recognizer’s perfor-
mance. This paper provides a novel method for accurately de-
tecting such recurrent OOV words. Standard OOV word de-
tection methods classify each word segment into in-vocabulary
(IV) or OOV. This word-by-word classification tends to be af-
fected by sudden vocal irregularities in spontaneous speech,
triggering false alarms. To avoid this sensitivity to the irreg-
ularities, our proposal focuses on consistency of the repeated
occurrence of OOV words. The proposed method preliminar-
ily detects recurrent segments, segments that contain the same
word, in a spoken document by open vocabulary spoken term
discovery using a phoneme recognizer. If the recurrent seg-
ments are OOV words, features for OOV detection in those
segments should exhibit consistency. We capture this consis-
tency by using the mean and variance (distribution) of features
(DOF) derived from the recurrent segments, and use the DOF
for IV/OOV classification. Experiments illustrate that the pro-
posed method’s use of the DOF significantly improves its per-
formance in recurrent OOV word detection.
Index Terms: speech recognition, OOV word detection, recur-
rent OOV words, distribution of features