DOCUMENT_TEXT_DETECTION API: Incorrect Japanese Character Recognition

Question

DOCUMENT_TEXT_DETECTION API: Incorrect Japanese Character Recognition

87 views Asked by OyaDuck At 11 March 2024 at 09:01

We are using the Vision API's OCR service (DOCUMENT_TEXT_DETECTION). However, since around 9:00 AM (JST) on March 8, 2024, we have noticed that some Japanese (JA) text is being recognized as old Japanese characters.

For example, the character "内" (nai) is being recognized as the old form "內" (nai). This is not happening for all old characters, and sometimes the standard Japanese character is returned.

This issue has not occurred in the past. Additionally, for documents that were recognized with old characters after March 8, 2024, subsequent recognitions will also return results with mixed old and new characters.

We have checked the response locale. Initially, we thought that this issue only affected the "und" locale, but we have confirmed that it also occurs with the "ja" locale.

Has there been a recent change to the internal algorithm?

If there is any solution to this problem, please let us know.

Thank you in advance for your help.

Additional Information:

Language: Japanese (JA)
OS: Windows
ENDPOINT: https://vision.googleapis.com/v1/images:annotate
SDK: REST

Reproducible Body:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "CLOUD_STORAGE_IMAGE_URI"
        }
       },
       "features": [
         {
           "type": "DOCUMENT_TEXT_DETECTION"
         }
       ]
    }
  ]
}

Expected Output: