Khmer Phonemic Inventory

At high-level perception, automatic speech recognition is merely a computer application that could take sound waves as input and produce the corresponding words, phrases, or sentences being spoken as text and indeed it is a transcription task which transforms verbal articulation into the written ones. Besides, the rudimentary stage in the speech processing task is to deal with acoustic properties of the language to come up with the criterion for pronuncation so that transcription job can be made possible.

Below section roughly described the acoustic features of Khmer language and provided the transcription in IPA and Arpabet format to represent the Phonemes of each character.

Khmer alpabet and pronuncation

In Cambodian script (called Khmer letters) there are 33 consonants, 24 dependent vowels, 12 independent vowels and several diacritic symbols. Most consonants have reduced or modified forms, called sub-consonants, when they occur as the second member of a consonant cluster. Vowels may be written before, after, over, or under a consonant symbol. [1]

Consonants

Consonants can be devided into 2 series: one ɑ-series which inherit /ɑ/ sound and the other is ɔ-series which inherit /ɔ/ sound. And the /ɑ/ or /ɔ/ sound comes from an abstruct (inherent) vowel (the first of the 24 vowels). In addition, diacritics ៉ (MUUSIKATOAN) and ៊ (TRIISAP) is used to change a consonant sound to ɑ-series and ɔ-series respectively. [2]

ɑ-series	sub-script	ɔ-series	sub-script	sound (IPA)	Arpabet
ក	្ក	គ	្គ	k	K
ខ	្ខ	ឃ	្ឃ	k^h	KH
ង៉		ង	្ង	ŋ	NG
ច	្ច	ជ	្ជ	c	C
ឆ	្ឆ	ឈ	្ឈ	c^h	CH
ញ៉		ញ	្ញ	ɲ	GN
ដ	្ដ	ឌ	្ឌ	ɗ	D
ឋ, ថ	្ឋ, ្ថ	ឍ, ធ	្ឍ, ្ធ	t^h	TH
ណ	្ណ	ន	្ន	n	N
ត	្ត	ទ	្ទ	t	T
ប	្ប	ប៊		ɓ	B
ផ	្ផ	ភ	្ភ	p^h	PH
ប៉		ព	្ព	p	P
ម៉		ម	្ម	m	M
យ៉		យ	្យ	j	Y
រ៉		រ	្រ	r	R
ឡ	្ឡ	ល	្ល	l	L
វ៉		វ	្វ	w	W
ស	្ស	ស៊		s	S
ហ	្ហ	ហ៊		h	HH
អ	្ឣ	ឣ៊		ʔ

Dependent Vowels

The pronunciation of a vowel, including the inherent vowel, is determinded by the series of the initial consonant or consonant cluster that it follows. [2]

Letter	ɑ-series	Arpabet	ɔ-series	Arpabet
Inherent vowel	ɑː	AA	ɔː	OA
ា	aː	AH	iːə	EA
ិ	e	EH	i	IH
ី	ej	EY	iː	IY
ឹ	ə	OE	ɨ	EO
ឺ	œː	ER	ɨː	EU
ុ	o	OH	u	UH
ូ	ɔːo	OW	uː	UW
ួ	uːə	UE	uːə	UE
ើ	aːə	AER	əː	EER
ឿ	ɨːə	EUR	ɨːə	EUR
ៀ	iːə	EA	iːə	EA
េ	eː	IE	eː	IE
ែ	aːɛ	AE	ɛː	AE
ៃ	aj	AY	ej	EY
ោ	aːo	AW	oː	OW
ៅ	aw	AOW	əw	AUW
ុំ	om	OUM	um	UM
ំ	ɑm	OM	um	UM
ាំ	am	AM	oam	AOM
ះ	ah	EHX	ɛah	AHX
ិះ	eh	EEH	ih	IH
ឹះ	əh	ERH	ɨh	EOH
ុះ	oh	OUH	uh	UUH
េះ	eh	OEH	ih	IYH
ោះ	ɑh	AOH	uəh	UEH

Independent Vowels

Unlinke dependent vowels, independent vowels can be the initial letter of word and they can be followed immediately by consonants but not dependent vowels.

Letter	IPA	Arpabet
ឣា	ʔaː	AH
ឥ	ʔe	EH
ឦ	ʔej	EY
ឧ	ʔu	UH
ឩ	ʔuː	UW
ឪ	ʔoːw	AUW
ឫ	ʔɨ	R EO
ឬ	ʔɨː	R EU
ឭ	lɨ	L EO
ឮ	lɨː	L EU
ឯ	ʔaːɛ	AE
ឰ	ʔaj	AY
ឱ	ʔaːo	AW
ឲ	ʔaːo	AW
ឳ	ʔaw

All the tables illustrated above are the extension of Text to sound mapping tables developed in [2].

References

Center for Southeast Asia Studies (Khmer) - Northen Illinois University
T.R. Annanda, S.M. Long, S. Heng, N. Long, K.H. Sok, “Complexity of Letter to Sound Conversion (LTS) in Khmer Language: under the context of Khmer Text-to-Speech (TTS)”. NLP lab, Department of Computer and Communication Engineering, Institute of Technology of Cambodia, Cambodia, PAN10 and IDRC Canada
Research on Phonetic and Phonological Analysis of Khmer
Omniglot - Khmer
S. Seng, S. Sam, V.-B. Le, B. Bigi, and L. Besacier, “WHICH UNITS FOR ACOUSTIC AND LANGUAGE MODELING FOR KHMER AUTOMATIC SPEECH RECOGNITION?” presented at the International Workshop on Spoken Languages Technologies for Under-Ressourced Languages, 2008.

Khmer ASR

Building speech recognition for Khmer language.