OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to Batch Process Long Documents Exceeding the Google Document AI Page Limit?

  • Thread starter Thread starter Leo Glowacki
  • Start date Start date
L

Leo Glowacki

Guest
I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~10k pages). While I found a method in the Document AI toolbox to create batches for GCS directories containing more files than the processor limit, it doesn't address individual files with too many pages.

Additionally, I discovered a parameter within ProcessOptions for sending a page range when processing online. However, it appears this parameter may not apply to batch processing. When I try to access it using the Python SDK, I encounter an error:

AttributeError: module 'google.cloud.documentai' has no attribute 'IndividualPageSelector'

I understand I can work around the page limit by manually breaking up my files and then combining the output, but I'm looking for a solution that avoids this additional preprocessing and postprocessing.

Is there a straightforward way to handle batch processing of long documents exceeding the processor page limit without manually splitting and recombining them? Thanks!
<p>I'm working with Google Document AI to process long documents, where the number of pages exceeds the processor limit (~10k pages). While I found a <a href="https://cloud.google.com/document-ai/docs/samples/documentai-toolbox-create-batches" rel="nofollow noreferrer">method in the Document AI toolbox to create batches for GCS directories containing more files than the processor limit</a>, it doesn't address individual files with too many pages.</p>
<p>Additionally, I discovered a <a href="https://cloud.google.com/document-ai/docs/reference/rest/v1beta3/ProcessOptions" rel="nofollow noreferrer">parameter within <code>ProcessOptions</code></a> for sending a page range when processing online. However, it appears this parameter may not apply to batch processing. When I try to access it using the Python SDK, I encounter an error:</p>
<p><code>AttributeError: module 'google.cloud.documentai' has no attribute 'IndividualPageSelector'</code></p>
<p>I understand I can work around the page limit by manually breaking up my files and then combining the output, but I'm looking for a solution that avoids this additional preprocessing and postprocessing.</p>
<p>Is there a straightforward way to handle batch processing of long documents exceeding the processor page limit without manually splitting and recombining them? Thanks!</p>
 

Latest posts

H
Replies
0
Views
1
habrewning
H
Top