Language identification in the spoken term detection system for the kazakh language in a multilinge environment
Keywords:
Language identification, Long Short-Term Memory Recurrent Neural Networks, Automatic Speech Recognition, Spoken Term DetectionAbstract
The processing of Big data is currently one of the most important tasks of the IT industry, and
audiomaterials are considered as one of the main sources of this data. Consequently, along with
the increase in the volume of audio information, it is necessary to create effective information
retrieval systems from audio materials (STD). Since audio data can be in different languages, it
is essential to recognize the language in the audio. Automatic language identification (LID) is
considered as a task which automatically distinguishes of the language spoken in a speech sample.
The modern progress in signal processing such as pattern recognition, machine learning and neural
networks increases the performance of LID. In this work we applied state-of-the-art technology
Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) to the raw audio features
in order to identify the audio samples in the Kazakh language. LSTM networks are considered
as a type of RNN which utilizes special units along with ones. Moreover, LSTM units consist of
«memory cell» which can keep information in memory for long periods of time. STD system can
select audio materials in Kazakh with LID and thus do not spend computing resources on audio
data in other languages. In this work we show results for conducted automatic speech recognition,
spoken term detection and language identification experiments with LSTM RNN for 1s, 2s and 3s
segments of audio samples in the Kazakh language.
