At high-level perception, automatic speech recognition is merely a computer application that could take sound waves as input and produce the corresponding words, phrases, or sentences being spoken as text and indeed it is a transcription task which transforms verbal articulation into the written ones. Besides, the rudimentary stage in the speech processing task is to deal with acoustic properties of the language to come up with the criterion for pronuncation so that transcription job can be made possible.
Below section roughly described the acoustic features of Khmer language and provided the transcription in IPA and Arpabet format to represent the Phonemes of each character.
Khmer alpabet and pronuncation
In Cambodian script (called Khmer letters) there are 33 consonants, 24 dependent vowels, 12 independent vowels and several diacritic symbols. Most consonants have reduced or modified forms, called sub-consonants, when they occur as the second member of a consonant cluster. Vowels may be written before, after, over, or under a consonant symbol. [1]
Consonants
Consonants can be devided into 2 series: one ɑ-series which inherit /ɑ/ sound and the other is ɔ-series which inherit /ɔ/ sound. And the /ɑ/ or /ɔ/ sound comes from an abstruct (inherent) vowel (the first of the 24 vowels). In addition, diacritics ៉ (MUUSIKATOAN) and ៊ (TRIISAP) is used to change a consonant sound to ɑ-series and ɔ-series respectively. [2]
ɑ-series | sub-script | ɔ-series | sub-script | sound (IPA) | Arpabet |
---|---|---|---|---|---|
ក | ្ក | គ | ្គ | k | K |
ខ | ្ខ | ឃ | ្ឃ | kh | KH |
ង៉ | ង | ្ង | ŋ | NG | |
ច | ្ច | ជ | ្ជ | c | C |
ឆ | ្ឆ | ឈ | ្ឈ | ch | CH |
ញ៉ | ញ | ្ញ | ɲ | GN | |
ដ | ្ដ | ឌ | ្ឌ | ɗ | D |
ឋ, ថ | ្ឋ, ្ថ | ឍ, ធ | ្ឍ, ្ធ | th | TH |
ណ | ្ណ | ន | ្ន | n | N |
ត | ្ត | ទ | ្ទ | t | T |
ប | ្ប | ប៊ | ɓ | B | |
ផ | ្ផ | ភ | ្ភ | ph | PH |
ប៉ | ព | ្ព | p | P | |
ម៉ | ម | ្ម | m | M | |
យ៉ | យ | ្យ | j | Y | |
រ៉ | រ | ្រ | r | R | |
ឡ | ្ឡ | ល | ្ល | l | L |
វ៉ | វ | ្វ | w | W | |
ស | ្ស | ស៊ | s | S | |
ហ | ្ហ | ហ៊ | h | HH | |
អ | ្ឣ | ឣ៊ | ʔ |
Dependent Vowels
The pronunciation of a vowel, including the inherent vowel, is determinded by the series of the initial consonant or consonant cluster that it follows. [2]
Letter | ɑ-series | Arpabet | ɔ-series | Arpabet |
---|---|---|---|---|
Inherent vowel | ɑː | AA | ɔː | OA |
ា | aː | AH | iːə | EA |
ិ | e | EH | i | IH |
ី | ej | EY | iː | IY |
ឹ | ə | OE | ɨ | EO |
ឺ | œː | ER | ɨː | EU |
ុ | o | OH | u | UH |
ូ | ɔːo | OW | uː | UW |
ួ | uːə | UE | uːə | UE |
ើ | aːə | AER | əː | EER |
ឿ | ɨːə | EUR | ɨːə | EUR |
ៀ | iːə | EA | iːə | EA |
េ | eː | IE | eː | IE |
ែ | aːɛ | AE | ɛː | AE |
ៃ | aj | AY | ej | EY |
ោ | aːo | AW | oː | OW |
ៅ | aw | AOW | əw | AUW |
ុំ | om | OUM | um | UM |
ំ | ɑm | OM | um | UM |
ាំ | am | AM | oam | AOM |
ះ | ah | EHX | ɛah | AHX |
ិះ | eh | EEH | ih | IH |
ឹះ | əh | ERH | ɨh | EOH |
ុះ | oh | OUH | uh | UUH |
េះ | eh | OEH | ih | IYH |
ោះ | ɑh | AOH | uəh | UEH |
Independent Vowels
Unlinke dependent vowels, independent vowels can be the initial letter of word and they can be followed immediately by consonants but not dependent vowels.
Letter | IPA | Arpabet |
---|---|---|
ឣា | ʔaː | AH |
ឥ | ʔe | EH |
ឦ | ʔej | EY |
ឧ | ʔu | UH |
ឩ | ʔuː | UW |
ឪ | ʔoːw | AUW |
ឫ | ʔɨ | R EO |
ឬ | ʔɨː | R EU |
ឭ | lɨ | L EO |
ឮ | lɨː | L EU |
ឯ | ʔaːɛ | AE |
ឰ | ʔaj | AY |
ឱ | ʔaːo | AW |
ឲ | ʔaːo | AW |
ឳ | ʔaw |
All the tables illustrated above are the extension of Text to sound mapping tables developed in [2].
References
- Center for Southeast Asia Studies (Khmer) - Northen Illinois University
- T.R. Annanda, S.M. Long, S. Heng, N. Long, K.H. Sok, “Complexity of Letter to Sound Conversion (LTS) in Khmer Language: under the context of Khmer Text-to-Speech (TTS)”. NLP lab, Department of Computer and Communication Engineering, Institute of Technology of Cambodia, Cambodia, PAN10 and IDRC Canada
- Research on Phonetic and Phonological Analysis of Khmer
- Omniglot - Khmer
- S. Seng, S. Sam, V.-B. Le, B. Bigi, and L. Besacier, “WHICH UNITS FOR ACOUSTIC AND LANGUAGE MODELING FOR KHMER AUTOMATIC SPEECH RECOGNITION?” presented at the International Workshop on Spoken Languages Technologies for Under-Ressourced Languages, 2008.