How do you scale Google Cloud Document AI processing?

569 views Asked by At

From https://cloud.google.com/document-ai/docs/process-forms, I can see some example of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the document ai in conjunction with Spark? Or is there another way?

2

There are 2 answers

0
Kevin Eid On

I could only find the following: batch_process_documents process many documents and return an async response that'll get saved in cloud storage.

From there, I think that we can parametrise our job by adding an input path of the bucket prefix and distribute the job over several machines.

All of that could be orchestrated via Airflow for example.

0
Holt Skinner On

You will need to use Batch Processing to handle multiple documents at once with Document AI.

This page in the Cloud Documentation shows how to make Batch Processing requests with REST and the Client Libraries.

https://cloud.google.com/document-ai/docs/send-request#batch-process

This codelab also illustrates how to do this in Python with the OCR Processor. https://codelabs.developers.google.com/codelabs/docai-ocr-python