A Rule Based Syllabification Algorithm for Sinhala (285 KB)
Ruvan Weerasinghe, Asanka Wasala and Kumudu Gamage, Language Technology Research Laboratory, University of Colombo School of Computing
This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllables in Sinhala words. After a thorough study of the Syllable structure and linguistic rules for syllabification of Sinhala words and a survey of the relevant literature, a set of rules was identified and implemented as a simple, easy-to-implement algorithm. The algorithm was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually syllabified. The algorithm performs with 99.95 % accuracy.
Sinhala Grapheme to Phoneme Conversion and Rules for Schwa Epenthesis (271 KB)
Asanka Wasala, Ruvan Weerasinghe and Kumudu Gamage, Language Technology Research Laboratory, University of Colombo School of Computing
This paper describes an architecture to convert Sinhala Unicode text into phonemic specification of pronunciation. The study was mainly focused on disambiguating schwa-/ə/ and /a/ vowel epenthesis for consonants, which is one of the significant problems found in Sinhala. This problem has been addressed by formulating a set of rules. The proposed set of rules was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually transcribed to phonemes by an expert.
The Grapheme-to-Phoneme (G2P) con version model achieves 98 % accuracy.