OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Is there a way to get the content from Azure Document Intelligence in markdown BUT separated page by page?

  • Thread starter Thread starter Nikola Petrovic
  • Start date Start date
N

Nikola Petrovic

Guest
This is what I tried. Content is in markdown but I don't have the page by page separation. On the other hand if I go into the pages attribute, there is no markdown.

Code:
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    analyze_request=AnalyzeDocumentRequest(base64_source=doc_bytes),
    output_content_format=ContentFormat.MARKDOWN
)

As mentioned above. I also tried with the "pages" attribute -

Code:
document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = document_intelligence_client.begin_analyze_document(
    "prebuilt-layout",
    analyze_request=AnalyzeDocumentRequest(base64_source=doc_bytes),
    output_content_format=ContentFormat.MARKDOWN,
    pages='1'
)

and iterate through the doc, but even though it's one page it still takes as much time to analyze as the full document.
<p>This is what I tried. Content is in markdown but I don't have the page by page separation. On the other hand if I go into the pages attribute, there is no markdown.</p>
<pre><code>document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-layout",
analyze_request=AnalyzeDocumentRequest(base64_source=doc_bytes),
output_content_format=ContentFormat.MARKDOWN
)
</code></pre>
<p>As mentioned above. I also tried with the "pages" attribute -</p>
<pre><code>document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = document_intelligence_client.begin_analyze_document(
"prebuilt-layout",
analyze_request=AnalyzeDocumentRequest(base64_source=doc_bytes),
output_content_format=ContentFormat.MARKDOWN,
pages='1'
)
</code></pre>
<p>and iterate through the doc, but even though it's one page it still takes as much time to analyze as the full document.</p>
 

Latest posts

B
Replies
0
Views
1
Blundering Ecologist
B
Top