InvoiceParser: errors with uptraing new version after activating invoice_type

71 views Asked by At

we are using gcloud document ai to parse invoices and we recently enabled the invoice type feature and relabeled all documents with the labeling feature. so that all invoices will have an invocie_type, however when we try to uptrain we get the following error:

{
    "code": 3,
    "message": "Invalid document.",
    "details": [
    {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "INVALID_DOCUMENT",
        "domain": "documentai.googleapis.com",
        "metadata": {
        "document": "gs://documentai-dataset/training/invoice_correction_102242.pdf",
        "annotation_name": "invoice_type",
        "num_fields_needed": "1",
        "field_name": "entities.text_anchor.text_segments",
        "num_fields": "0"
        }
    }
    ]
},

the error happens for a lot of documents, but not for all documents. is there a reason why it happens?

we actually relabeled all documents and the invoice_type will be set by classification, so it looks like that:

{
    "id": "a47c19621d45f83d",
    "normalizedValue": {
        "text": "invoice_statement"
    },
    "type": "invoice_type"
}

and all documents that print the error actually have the value like that? so it's not even an an text_anchor element and its also not possible to set it like a text anchor, does somebody have an idea why this is happening?

1

There are 1 answers

3
Holt Skinner On

For an Uptrained Processor, you can't add a general classification type to the Schema unless you can extract it directly from text in the document. For example, if there was actual text in the document that says "Invoice Statement".

You will need to use the Procurement Splitter & Classifier or create a Custom Document Classifier to perform customized classification of documents.