OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Inference with LLava v1.6 Mistral model on Amazon SageMaker

  • Thread starter Thread starter Aleksandar Cvjetic
  • Start date Start date
A

Aleksandar Cvjetic

Guest
I've deployed the following model llava-hf/llava-v1.6-mistral-7b-hf in Amazon SageMaker simply pasting deployment code from model card (https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf). Deployment seems to have gone well, and in the same notebook in Amazon SageMaker I tried to test inference by using boto3 client and invoke_endpoint function (I want to send an image and prompt asking model to describe what's in the image). The complete deployment and inference code from the Amazon SageMaker notebook looks like the following:

Code:
# DEPLOYMENT PART:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'llava-hf/llava-v1.6-mistral-7b-hf',
    'HF_TASK':'image-text-to-text'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.37.0',
    pytorch_version='2.1.0',
    py_version='py310',
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.p3.2xlarge' # ec2 instance type
)

Code:
# INFERENCE PART:

import json
from PIL import Image 
import requests

client = boto3.client('sagemaker-runtime')
endpoint_name = 'huggingface-pytorch-inference-2024-06-22-21-48-42-168'

url = "https://www.ikea.com/pl/pl/images/products/silvtjaern-pojemnik__1150132_pe884373_s5.jpg?f=xl"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "[INST] <image>\nWhat is shown in this image? [/INST]"

payload = json.dumps(prompt)

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=payload
)

result = json.loads(response['Body'].read().decode())
print(result)

My goal is to invoke endpoint for inference using Lambda function and API GW from outside of AWS, and therefore I first tried to test inference locally from the SageMaker notebook, but after running this inference code I've got the following error in the notebook:

Code:
ModelError                                Traceback (most recent call last)
Cell In[6], line 3
      1 payload = json.dumps(prompt)
----> 3 response = client.invoke_endpoint(
      4     EndpointName=endpoint_name,
      5     ContentType='application/json',
      6     Body=payload
      7 )
      9 result = json.loads(response['Body'].read().decode())
     10 print(result)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    561     raise TypeError(
    562         f"{py_operation_name}() only accepts keyword arguments."
    563     )
    564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
   1017     error_code = error_info.get("QueryErrorCode") or error_info.get(
   1018         "Code"
   1019     )
   1020     error_class = self.exceptions.from_code(error_code)
-> 1021     raise error_class(parsed_response, operation_name)
   1022 else:
   1023     return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "The checkpoint you are trying to load has model type `llava_next` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

Can someone help me understand what's wrong here and how to actually invoke this model using Lambda and Python boto3 client?

I checked the following docs https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf https://medium.com/@liltom.eth/deploy-llava-1-5-on-amazon-sagemaker-168b2efd2489 https://medium.com/@vishaaly/how-to-deploy-llava-models-to-sagemaker-endpoints-25a94a58f98c How to perform inference with a Llava Llama model deployed to SageMaker from Huggingface?

but no similar issue found.
<p>I've deployed the following model llava-hf/llava-v1.6-mistral-7b-hf in Amazon SageMaker simply pasting deployment code from model card (<a href="https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf" rel="nofollow noreferrer">https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf</a>). Deployment seems to have gone well, and in the same notebook in Amazon SageMaker I tried to test inference by using boto3 client and invoke_endpoint function (I want to send an image and prompt asking model to describe what's in the image). The complete deployment and inference code from the Amazon SageMaker notebook looks like the following:</p>
<pre><code># DEPLOYMENT PART:

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'llava-hf/llava-v1.6-mistral-7b-hf',
'HF_TASK':'image-text-to-text'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.37.0',
pytorch_version='2.1.0',
py_version='py310',
env=hub,
role=role,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.p3.2xlarge' # ec2 instance type
)
</code></pre>
<pre><code># INFERENCE PART:

import json
from PIL import Image
import requests

client = boto3.client('sagemaker-runtime')
endpoint_name = 'huggingface-pytorch-inference-2024-06-22-21-48-42-168'

url = "https://www.ikea.com/pl/pl/images/products/silvtjaern-pojemnik__1150132_pe884373_s5.jpg?f=xl"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "[INST] <image>\nWhat is shown in this image? [/INST]"

payload = json.dumps(prompt)

response = client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=payload
)

result = json.loads(response['Body'].read().decode())
print(result)
</code></pre>
<p>My goal is to invoke endpoint for inference using Lambda function and API GW from outside of AWS, and therefore I first tried to test inference locally from the SageMaker notebook, but after running this inference code I've got the following error in the notebook:</p>
<pre><code>ModelError Traceback (most recent call last)
Cell In[6], line 3
1 payload = json.dumps(prompt)
----> 3 response = client.invoke_endpoint(
4 EndpointName=endpoint_name,
5 ContentType='application/json',
6 Body=payload
7 )
9 result = json.loads(response['Body'].read().decode())
10 print(result)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:565, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
561 raise TypeError(
562 f"{py_operation_name}() only accepts keyword arguments."
563 )
564 # The "self" in this scope is referring to the BaseClient.
--> 565 return self._make_api_call(operation_name, kwargs)

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/botocore/client.py:1021, in BaseClient._make_api_call(self, operation_name, api_params)
1017 error_code = error_info.get("QueryErrorCode") or error_info.get(
1018 "Code"
1019 )
1020 error_class = self.exceptions.from_code(error_code)
-> 1021 raise error_class(parsed_response, operation_name)
1022 else:
1023 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "The checkpoint you are trying to load has model type `llava_next` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."
</code></pre>
<p>Can someone help me understand what's wrong here and how to actually invoke this model using Lambda and Python boto3 client?</p>
<p>I checked the following docs
<a href="https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf" rel="nofollow noreferrer">https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf</a>
<a href="https://medium.com/@liltom.eth/deploy-llava-1-5-on-amazon-sagemaker-168b2efd2489" rel="nofollow noreferrer">https://medium.com/@liltom.eth/deploy-llava-1-5-on-amazon-sagemaker-168b2efd2489</a>
<a href="https://medium.com/@vishaaly/how-to-deploy-llava-models-to-sagemaker-endpoints-25a94a58f98c" rel="nofollow noreferrer">https://medium.com/@vishaaly/how-to-deploy-llava-models-to-sagemaker-endpoints-25a94a58f98c</a>
<a href="https://stackoverflow.com/questions...va-llama-model-deployed-to-sagemaker-from-hug">How to perform inference with a Llava Llama model deployed to SageMaker from Huggingface?</a></p>
<p>but no similar issue found.</p>
 

Latest posts

M
Replies
0
Views
1
Mohit Pant
M
Top