Khmer ASR

Building speech recognition for Khmer language.

Khmer Phonemic Inventory

At high-level perception, automatic speech recognition is merely a computer application that could take sound waves as input and produce the corresponding words, phrases, or sentences being spoken as text and indeed it is a transcription task which transforms verbal articulation into the written ones. Besides, the rudimentary stage in the speech processing task is to deal with acoustic properties of the language to come up with the criterion for pronuncation so that transcription job can be made possible.

Below section roughly described the acoustic features of Khmer language and provided the transcription in IPA and Arpabet format to represent the Phonemes of each character.

Khmer alpabet and pronuncation

In Cambodian script (called Khmer letters)​ there are 33 consonants, 24 dependent vowels, 12 independent vowels and several diacritic symbols.

Khmer Keywords Transcription

The Khmer keywords dataset collected by Institute of Technology of Cambodia (Khmer: វិទ្យាស្ថាន​បច្ចេកវិទ្យា​កម្ពុជា, (ITC)) consists of 192 khmer words/phrases read by fifteen speakers(college students aged between 19 and 23) from six different provinces and below is the transcription of these words by mean of transliteration.

As there is no standard way to romanize Khmer words the transcription table here is also not intended to suggest the norm of the method.