Language Technology Research Laboratory
University of Colombo School of Computing
|
|
|
|
PAN Localization Project - Phase II
Five main tasks are identified under the Phase 2 proposal. They are,
-
Regional Tasks
-
Building a Machine Assisted Translation tool
-
Developing a Handwriting recognition system on mobile devices
-
Developing an effective training Methodology and Material Development for local languages teaching and learning
-
Provide training on Sinhala web content development
|
.....................................................................................................................................
|
-
Regional tasks
This activity is proposed in collaboration with other partners in order to build up a repository of tools and resources across the language groups covered by the project. It is divided into several sub tasks and some of these are supported by the corpus collected in phase 1.
-
Parallel Corpus
This sub-task is concerned with building a Sinhala & English parallel corpus by agreeing with partners to a common English corpus for which translations in all participating languages would be developed. This would be a rich resource for various kinds of inter-language processing work including Machine Translation.
-
5000 word local WordNet
The primary aim of this sub-component is to build a Sinhala WordNet. The task will be initiated by jointly identifying a list of 5000 English words together with other partner countries. Translating these words into the local language (Sinhala in our case) with the appropriate linguistic features including POS-tags is the next step of the procedure. The final aim is to compile these words with WordNet senses in order to form a valuable linguistic resource for various NLP tasks.
-
Localized URL
As localized web pages are developed, needing to use non-local URLs to access these pages becomes a barrier to the average native user. In order to overcome this problem, this sub-task is concerned with building a framework to represent URLs in local languages.
-
Build a Machine Assisted Translation tool
One of the key enabling technologies for wide access to ICT is for content to be available in one's native tongue. With the vast amount of information already available on the web in English and other non-local languages, translation becomes of crucial importance if citizen’s of countries such as Sri Lanka are to benefit. This task is concerned with making this process speedier for human translators, with much of the routine translations done automatically. The approach to be used to achieve this task is based on Example Based Machine Translation where a mechanism for sharing Translation Memories will be built in order to assist translators to have access to the experience of other translators.
-
Develop Handwriting recognition on mobile devices
With the ever increasing importance of the stylus as an input device to handheld devices, online handwriting recognition is becoming crucial. In this task, the main objective is to develop an easy-to-use Graffiti style solution to recognize Sinhala characters on different mobile platforms.
-
Develop effective training Methodology and Material Development for local language teaching and learning
Language is a very powerful tool for mutual understanding between people. The lack of knowledge in another's language on the other hand has been the cause of many a misunderstandings and causes for wars. Sri Lanka's ethnic conflict has roots in language among other things. No local language project could ignore the strategic opportunity provided by technology to scale the teaching and learning of another language. This is the aim of this task: to develop effective training materials and methodology to make learning another language less arduous. The framework developed is expected to be flexible enough to extend itself to be used by other project partners to teach their languages.
-
Provide training on Sinhala web content development
The distribution of content on the World Wide Web in different languages of the world does not accurately reflect the users of such languages. Languages in the partner countries are grossly under represented. In order to mitigate this anomaly, this component is designed to encourage the publishing of local language content on the web. Apart from the technologies surrounding UNICODE, methods of content publishing ranging from web site development to uploading content to blogs and wikis will be part of this training.
|
| © Language Technology Research Laboratory, 2011 |
Last updated on 14 December 2011 |