I would like to create a human review loop for images that undergone OCR using Amazon Textract and Entity Extraction using Amazon Comprehend.
My process is:
- send image to Textract to extract the text
- send text to Comprehend to extract entities
- find the Block IDs in Textract's output of the entities extracted by Comprehend
- add new Blocks of type
KEY_VALUE_SET
to textract's JSON output per the docs - create a Human Task with
crowd-textract-analyze-document
element in the template and feed it the modified textract output
What fails to work in this process is step 5. My custom entities are not rendered properly. By "fails to work" I mean that the entities are not highlighted on the image when I click them on the sidebar. There is no error in the browser's console.
Has anyone tried such a thing?
Sorry for not including examples. I will remove secrets/PII from my files and attach them to the question
I used the AWS documentation of the a2i-crowd-textract-detection human task element to generate the value of the
initialValue
attribute. It appears the doc for that attribute is incorrect. While the the doc shows that the value should be in the same format as the output of Textract, namely:the
a2i-crowd-textract-detection
expects the input to have lowerCamelCase attribute names (rather than UpperCamelCase). For example:I opened a support case about this documentation error to AWS.