Thursday, April 22, 2004

The Indian Language Technologies has a major project for the Devanagari OCR (Optical Character Recognition) development. It is part of the Center of Excellence in Document Analysis and Recognition (CEDAR), at the State University of New York at Buffalo. The project is sponsored by National Science Foundation (NSF), US and the Indian Statistical Institute, Calcutta, India. A sample truthing tool is available for download with different samples of OCR. The output is generated in Itrans based text file. The effort is led by Dr. Venugopal Govindaraju.

The Center for Development of Advanced Computing (CDAC), India has also developed, among many excellent commercial products, an OCR called Chitrankan. The software is applicable currently to Devanagari with embedded English text and has potential for extension to other Indian languages.

The BharatiyaBhasha multilingual dictionary consisting of nearly 5000 common words in 14 different languages is available for download. There are quite a few tools developed at Technology Development for Indian Languages.

A "Hindi Vishva Kosh," a Hindi pictorial encyclopaedia is available at http://www.erdcifast.net/vishwa/vishwa/homepage.asp. It is sponsored by Government of India. (The connection is slow.)

This page is powered by Blogger. Isn't yours?