OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How do I store vectors generated by AzureOpenAIEmbeddingSkill in indexer given my current setup

  • Thread starter Thread starter Mike B
  • Start date Start date
M

Mike B

Guest
This is a follow up question to: Error in Azure Cognitive Search Service when storing document page associated to each chunk extracted from PDF in a custom WebApiSkill

How do I store the vectors generated by AzureOpenAIEmbeddingSkill in indexer given my current setup:

  • Custom WebApiSkill:

Code:
combined_list = [{'textItems': text, 'numberItems': number} for text, number in zip(chunks, page_numbers)]

# response object for specific pdf
response_record = {
    "recordId": recordId,
    "data": {
        "subdata": combined_list
    }
}
response_body['values'].append(response_record)
  • Skillset definition:

Code:
{
  ...
  "description": "Skillset to chunk documents and generating embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
      "name": "splitclean",
      "description": "Custom split skill to chunk documents with specific chunk size and overlap",
      "context": "/document",
      "httpMethod": "POST",
      "timeout": "PT30S",
      "batchSize": 1,
      "degreeOfParallelism": null,
      "authResourceId": null,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "subdata",
          "targetName": "subdata"
        }
      ],
      "authIdentity": null
    },
    {
      "name": "#2",
      "description": "Skill to generate embeddings via Azure OpenAI",
      "context": "/document/subdata/*",
      "apiKey": "<redacted>",
      "deploymentId": "embedding-ada-002",
      "dimensions": null,
      "modelName": "experimental",
      "inputs": [
        {
          "name": "text",
          "source": "/document/subdata/*/textItems"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": {
    "selectors": [
      {
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/subdata/*",
        "mappings": [
          {
            "name": "chunk",
            "source": "/document/subdata/*/textItems",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "vector",
            "source": "/document/subdata/*/vector",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "title",
            "source": "/document/metadata_storage_name",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "page_number",
            "source": "/document/subdata/*/numberItems",
            "sourceContext": null,
            "inputs": []
          }
        ]
      }
    ],
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
    }
  },
  "encryptionKey": null
}

I get the following error in AzureOpenAIEmbeddingSkill:

Code:
Web Api response status: 'Unauthorized', Web Api response details: '{"error":{"code":"401","message":"Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."}}'
<p>This is a follow up question to: <a href="https://stackoverflow.com/questions...615922?noredirect=1#comment138603745_78615922">Error in Azure Cognitive Search Service when storing document page associated to each chunk extracted from PDF in a custom WebApiSkill</a></p>
<p>How do I store the vectors generated by AzureOpenAIEmbeddingSkill in indexer given my current setup:</p>
<ul>
<li>Custom WebApiSkill:</li>
</ul>
<pre class="lang-py prettyprint-override"><code>combined_list = [{'textItems': text, 'numberItems': number} for text, number in zip(chunks, page_numbers)]

# response object for specific pdf
response_record = {
"recordId": recordId,
"data": {
"subdata": combined_list
}
}
response_body['values'].append(response_record)
</code></pre>
<ul>
<li>Skillset definition:</li>
</ul>
<pre class="lang-yaml prettyprint-override"><code>{
...
"description": "Skillset to chunk documents and generating embeddings",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "splitclean",
"description": "Custom split skill to chunk documents with specific chunk size and overlap",
"context": "/document",
"httpMethod": "POST",
"timeout": "PT30S",
"batchSize": 1,
"degreeOfParallelism": null,
"authResourceId": null,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "subdata",
"targetName": "subdata"
}
],
"authIdentity": null
},
{
"name": "#2",
"description": "Skill to generate embeddings via Azure OpenAI",
"context": "/document/subdata/*",
"apiKey": "<redacted>",
"deploymentId": "embedding-ada-002",
"dimensions": null,
"modelName": "experimental",
"inputs": [
{
"name": "text",
"source": "/document/subdata/*/textItems"
}
],
"outputs": [
{
"name": "embedding",
"targetName": "vector"
}
],
"authIdentity": null
}
],
"cognitiveServices": null,
"knowledgeStore": null,
"indexProjections": {
"selectors": [
{
"parentKeyFieldName": "parent_id",
"sourceContext": "/document/subdata/*",
"mappings": [
{
"name": "chunk",
"source": "/document/subdata/*/textItems",
"sourceContext": null,
"inputs": []
},
{
"name": "vector",
"source": "/document/subdata/*/vector",
"sourceContext": null,
"inputs": []
},
{
"name": "title",
"source": "/document/metadata_storage_name",
"sourceContext": null,
"inputs": []
},
{
"name": "page_number",
"source": "/document/subdata/*/numberItems",
"sourceContext": null,
"inputs": []
}
]
}
],
"parameters": {
"projectionMode": "skipIndexingParentDocuments"
}
},
"encryptionKey": null
}
</code></pre>
<p>I get the following error in <code>AzureOpenAIEmbeddingSkill</code>:</p>
<pre><code>Web Api response status: 'Unauthorized', Web Api response details: '{"error":{"code":"401","message":"Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."}}'
</code></pre>
Continue reading...
 

Latest posts

Top