How to highlight custom extractions using a2i's crowd-textract-analyze-document?

Question

How to highlight custom extractions using a2i's crowd-textract-analyze-document?

412 views Asked by Amnon At 11 October 2020 at 10:29

I would like to create a human review loop for images that undergone OCR using Amazon Textract and Entity Extraction using Amazon Comprehend.

My process is:

send image to Textract to extract the text
send text to Comprehend to extract entities
find the Block IDs in Textract's output of the entities extracted by Comprehend
add new Blocks of type KEY_VALUE_SET to textract's JSON output per the docs
create a Human Task with crowd-textract-analyze-document element in the template and feed it the modified textract output

What fails to work in this process is step 5. My custom entities are not rendered properly. By "fails to work" I mean that the entities are not highlighted on the image when I click them on the sidebar. There is no error in the browser's console.

Has anyone tried such a thing?

Sorry for not including examples. I will remove secrets/PII from my files and attach them to the question

Original Q&A

There are 1 answers

**Amnon** · Accepted Answer · 2020-10-18T09:15:33+00:00

I used the AWS documentation of the a2i-crowd-textract-detection human task element to generate the value of the initialValue attribute. It appears the doc for that attribute is incorrect. While the the doc shows that the value should be in the same format as the output of Textract, namely:

[
        {
            "BlockType": "KEY_VALUE_SET",
            "Confidence": 38.43309020996094,
            "Geometry": { ... }
            "Id": "8c97b240-0969-4678-834a-646c95da9cf4",
            "Relationships": [
                { "Type": "CHILD", "Ids": [...]},
                { "Type": "VALUE", "Ids": [...]}
            ],
            "EntityTypes": ["KEY"],
            "Text": "Foo bar"
        },
]

the a2i-crowd-textract-detection expects the input to have lowerCamelCase attribute names (rather than UpperCamelCase). For example:

[
        {
            "blockType": "KEY_VALUE_SET",
            "confidence": 38.43309020996094,
            "geometry": { ... }
            "id": "8c97b240-0969-4678-834a-646c95da9cf4",
            "relationships": [
                { "Type": "CHILD", "ids": [...]},
                { "Type": "VALUE", "ids": [...]}
            ],
            "entityTypes": ["KEY"],
            "text": "Foo bar"
        },
]

I opened a support case about this documentation error to AWS.

TechQA.

How to highlight custom extractions using a2i's crowd-textract-analyze-document?

There are 1 answers

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-SAGEMAKER

Related Questions in AMAZON-TEXTRACT

Related Questions in AMAZON-COMPREHEND

Popular Questions

Popular Tags

Trending Questions