Image: University of Colombo School of Computing -Language Technology Research Laboratory

Text only version
Contact Us Image Link:Home
Imgae: Speech Initiative

Resources for Developers

Publications

A Rule Based Syllabification Algorithm for Sinhala (285 KB)
Ruvan Weerasinghe, Asanka Wasala and Kumudu Gamage, Language Technology Research Laboratory, University of Colombo School of Computing

Abstract.
This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllables in Sinhala words. After a thorough study of the Syllable structure and linguistic rules for syllabification of Sinhala words and a survey of the relevant literature, a set of rules was identified and implemented as a simple, easy-to-implement algorithm. The algorithm was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually syllabified. The algorithm performs with 99.95 % accuracy.


Sinhala Grapheme to Phoneme Conversion and Rules for Schwa Epenthesis (271 KB)
Asanka Wasala, Ruvan Weerasinghe and Kumudu Gamage, Language Technology Research Laboratory, University of Colombo School of Computing

Abstract.
This paper describes an architecture to convert Sinhala Unicode text into phonemic specification of pronunciation. The study was mainly focused on disambiguating schwa-/ə/ and /a/ vowel epenthesis for consonants, which is one of the significant problems found in Sinhala. This problem has been addressed by formulating a set of rules. The proposed set of rules was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually transcribed to phonemes by an expert. The Grapheme-to-Phoneme (G2P) con version model achieves 98 % accuracy.


top


Useful Links

top

© 2006-2007 by Language Technology Research Laboratory, University of Colombo School of Computing, Sri Lanka
Last Updated On: 28th November 2006