OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Loading pre-trained Transformer model with AddedTokens using from_pretrained

  • Thread starter Thread starter Stefano Mezza
  • Start date Start date
S

Stefano Mezza

Guest
I have pre-trained a "meta-llama/Llama-2-7b-chat-hf" model using the transformers library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the "embed_tokens" module of the network. My training code looked like this:

Code:
  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
  tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
                                                              AddedToken("<|endmove|>"),
                                                              AddedToken("<|end|>")]})

  model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )
  model.resize_token_embeddings(len(tokenizer))
  peft_config = LoraConfig(
      lora_alpha=lora_alpha,
      lora_dropout=lora_dropout,
      r=lora_r,
      bias="none",
      modules_to_save= ["embed_tokens", "lm_head"],
      task_type="CAUSAL_LM",
  )

The model trained and saved successfully. However, when trying to load it using AutoModelForCausalLM.from_pretrained, I get the following error:

Code:
Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])

I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?

I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn't find anything. I read a blog post mentioning that passing ignore_mismatched_sizes=True to the from_pretrained function would solve the issue, but it didn't work for me.

EDIT: To load my local model, I use the same from_pretrained function that I use to load the meta-llama model from huggingface:

Code:
`model = AutoModelForCausalLM.from_pretrained(
    local_model_folder,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )

This works correctly when loading pre-trained models with no changes to the vocabulary size.
<p>I have pre-trained a <code>"meta-llama/Llama-2-7b-chat-hf"</code> model using the <code>transformers</code> library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the "embed_tokens" module of the network. My training code looked like this:</p>
<pre><code> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
AddedToken("<|endmove|>"),
AddedToken("<|end|>")]})

model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map,
token=hf_token
)
model.resize_token_embeddings(len(tokenizer))
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
modules_to_save= ["embed_tokens", "lm_head"],
task_type="CAUSAL_LM",
)
</code></pre>
<p>The model trained and saved successfully. However, when trying to load it using <code>AutoModelForCausalLM.from_pretrained</code>, I get the following error:</p>
<pre><code>Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])
</code></pre>
<p>I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?</p>
<p>I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn't find anything. I read a blog post mentioning that passing <code>ignore_mismatched_sizes=True</code> to the from_pretrained function would solve the issue, but it didn't work for me.</p>
<p>EDIT: To load my local model, I use the same <code>from_pretrained</code> function that I use to load the meta-llama model from huggingface:</p>
<pre><code>`model = AutoModelForCausalLM.from_pretrained(
local_model_folder,
quantization_config=bnb_config,
device_map=device_map,
token=hf_token
)
</code></pre>
<p>This works correctly when loading pre-trained models with no changes to the vocabulary size.</p>
 

Latest posts

Top