Transliteration from a source language to Roman (English) script

382 views Asked by At

We need the Romanization feature badly. Can someone please help? We want to transliterate (not translate) from Hindi (Devanagiri script) language to English (Roman script) language.

Input
romanize_text('अंतिम लक्ष्य क्या है')

Expected Output
'antim lakshya kya hai'

As per the Google Romanize text docs, I wrote the following Python code to transliterate from some language script to Roman script.

# Authenticate using credentials.
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "translate.json"

PROJECT_ID = "project-id"
LOCATION = "global"

# Imports the Google Cloud Translation library
from google.cloud import translate_v3

# Transliteration.
def romanize_text(text, src_lang="hi", tgt_lang="en"):

    client = translate_v3.TranslationServiceClient()
    parent = f"projects/{PROJECT_ID}/locations/{LOCATION}"

    response = client.romanize_text(
        request={
            "parent": parent,
            "contents": [text],
            "source_language_code": src_lang,
            "target_language_code": tgt_lang,
        }
    )

    # Display the romanized for each input text provided
    for romanization in response.romanizations:
        print(f"Romanized text: {romanization.romanized_text}")

romanize_text('अंतिम लक्ष्य क्या है')

Running the above code, gives the following error:

AttributeError: 'TranslationServiceClient' object has no attribute 'romanize_text'

Also, in the Google's API reference of romanizeText, the right-hand side API Explorer is broken. Whereas, if you select any other method from the left-hand side - its API Explorer works correctly.

We need the Romanization feature badly: so either a solution to the aforementioned problem, or an alternative non-Google solution for romanization would be fine.

1

There are 1 answers

2
Kyle F. Hartzenberg On

You are receiving the error when you call client.romanize_text in your function because there is no romanize_text function in the source code for the client.

The transliteration documentation for "advanced translating text v3" says that:

Transliteration is a configuration setting in the translateText method. When you enable transliteration, you translate romanized text (Latin script) directly to a target language.

However, you want to translate from a specified language to romanized text so this feature doesn't seem to be available (yet) via the Google Cloud Translate API. This observation is substantiated/alluded to in this Stack Overflow answer to a question similar to yours.

It seems like the PyPI package ai4bharat-transliteration by the researchers at AI4Bharat is a viable non-Google alternative for transliteration from Hindi to romanized text.