OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Masked Query Gradient Flow to Keys and Values

  • Thread starter Thread starter DeerFreak
  • Start date Start date
D

DeerFreak

Guest
I was wondering why the gradient in this scaled dot product example does not flow to the key and value. What am I doing wrong? How can I use padded batches with different target sequence lengths? Can I pad keys/values and queries together?

Code:
import torch
from torch.nn.functional import scaled_dot_product_attention

k = v = torch.rand(3, 4, 8)
q = torch.rand(3, 5, 8)

q.requires_grad = True
k.requires_grad = True
v.requires_grad = True

mask = torch.ones(3, 5, 4, dtype=torch.bool)
mask[:, :, -1] = 0
mask[:, -1, :] = 0

out = scaled_dot_product_attention(q, k, v, attn_mask=mask)
torch.mean(out[:, :-1, :]).backward()
<p>I was wondering why the gradient in this scaled dot product example does not flow to the key and value. What am I doing wrong? How can I use padded batches with different target sequence lengths? Can I pad keys/values and queries together?</p>
<pre><code>import torch
from torch.nn.functional import scaled_dot_product_attention

k = v = torch.rand(3, 4, 8)
q = torch.rand(3, 5, 8)

q.requires_grad = True
k.requires_grad = True
v.requires_grad = True

mask = torch.ones(3, 5, 4, dtype=torch.bool)
mask[:, :, -1] = 0
mask[:, -1, :] = 0

out = scaled_dot_product_attention(q, k, v, attn_mask=mask)
torch.mean(out[:, :-1, :]).backward()
</code></pre>
 

Latest posts

L
Replies
0
Views
1
lagnaoui jihane
L
E
Replies
0
Views
1
Eduard Dubilyer
E
Top