OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How do I fit a normal distribution to a dirac delta distribution for behavior cloning in reinforcement learning?

  • Thread starter Thread starter Tigertron
  • Start date Start date
T

Tigertron

Guest
I am attempting to apply behavior cloning to actor-critic agents that operate under a continuous action space. I have some dirac delta function is denoted as p_bc(a|x) where a is the action and x is the state. I have a test neural network that represents the actor where it trys to learn some mean, u(x), and sigma, o(x), for a given x. I have tested MSE and NLL as loss functions. I wanted to test KL loss but because I could not find any documentation on dirac delta functions, I am unsure if there is a distribution module in pytorch. Basically, my question is what am I doing wrong and what are some alternatives I could do?

I have some test neural network in pytorch

Code:
class DistributionFitModel(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.model = nn.Sequential(
            nn.Linear(1, 16),
            nn.Linear(16, 32),
            nn.Linear(32, 16)
        )
        
        self.mu = nn.Sequential(nn.Linear(16, 1), nn.ReLU())
        self.logvar = nn.Sequential(nn.Linear(16, 1), nn.ReLU())
        
    def forward(self, x):
        x = self.model(x)
        mu = self.mu(x)
        logvar = self.logvar(x)
        
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return eps.mul(std).add_(mu)
    
    def mu_sigma(self, x):
        x = self.model(x)
        return self.mu(x), torch.exp(0.5 * self.logvar(x))

I use logvar instead of var directly because that's what I've seen VAE implementations do

Code:
keys = [(0, -1), (1, 1), (2, -1), (3, 0), (4, 1), (5, 0)]
ys = []
xs = []
L = 1000
for x_mul, y_mul in keys:
    ys += (np.ones(1000) * y_mul).tolist()
    xs += (np.ones(1000) * x_mul).tolist()

I generate 1000 samples of a given x and corresponding action (ie x = 0, a = -1 for first key)

I have some dataset/dataloader to load in the generated data

Code:
class DistDataset(Dataset):
    def __init__(self, x, y):
        super().__init__()
        self.x = x
        self.y = y
        
    def __len__(self):
        return len(self.x)
        
    def __getitem__(self, item):
        return torch.tensor(self.x[item]), torch.tensor(self.y[item])
dataset = DistDataset(xs, ys)
dataloader = DataLoader(dataset, batch_size=512, shuffle=True)

I have the training loop set up as

Code:
for _ in range(1000):
    running_loss = 0
    for x, y in dataloader:
        x = x.reshape(-1, 1).float()
        y = y.reshape(-1, 1).float()
        
        pred = model(x).reshape(-1, 1)
        optimizer.zero_grad()
        loss = F.mse_loss(pred, y)
        
        # mu, sigma = model.mu_sigma(x)
        # dist = Normal(mu, sigma)
        # values = dist.log_prob(y)
        # loss += torch.sum(values)
        
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    print(running_loss / len(dataloader))

when I call model.mu_sigma(x), I am getting mu = 0 and sigma = 1 for all the outputs

Is there something I am doing wrong? Is there some better loss function I should be doing?
<p>I am attempting to apply behavior cloning to actor-critic agents that operate under a continuous action space. I have some dirac delta function is denoted as p_bc(a|x) where a is the action and x is the state. I have a test neural network that represents the actor where it trys to learn some mean, u(x), and sigma, o(x), for a given x. I have tested MSE and NLL as loss functions. I wanted to test KL loss but because I could not find any documentation on dirac delta functions, I am unsure if there is a distribution module in pytorch. Basically, my question is what am I doing wrong and what are some alternatives I could do?</p>
<p>I have some test neural network in pytorch</p>
<pre><code>class DistributionFitModel(nn.Module):
def __init__(self):
super().__init__()

self.model = nn.Sequential(
nn.Linear(1, 16),
nn.Linear(16, 32),
nn.Linear(32, 16)
)

self.mu = nn.Sequential(nn.Linear(16, 1), nn.ReLU())
self.logvar = nn.Sequential(nn.Linear(16, 1), nn.ReLU())

def forward(self, x):
x = self.model(x)
mu = self.mu(x)
logvar = self.logvar(x)

std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return eps.mul(std).add_(mu)

def mu_sigma(self, x):
x = self.model(x)
return self.mu(x), torch.exp(0.5 * self.logvar(x))
</code></pre>
<p>I use logvar instead of var directly because that's what I've seen VAE implementations do</p>
<pre><code>keys = [(0, -1), (1, 1), (2, -1), (3, 0), (4, 1), (5, 0)]
ys = []
xs = []
L = 1000
for x_mul, y_mul in keys:
ys += (np.ones(1000) * y_mul).tolist()
xs += (np.ones(1000) * x_mul).tolist()
</code></pre>
<p>I generate 1000 samples of a given x and corresponding action (ie x = 0, a = -1 for first key)</p>
<p>I have some dataset/dataloader to load in the generated data</p>
<pre><code>class DistDataset(Dataset):
def __init__(self, x, y):
super().__init__()
self.x = x
self.y = y

def __len__(self):
return len(self.x)

def __getitem__(self, item):
return torch.tensor(self.x[item]), torch.tensor(self.y[item])
dataset = DistDataset(xs, ys)
dataloader = DataLoader(dataset, batch_size=512, shuffle=True)
</code></pre>
<p>I have the training loop set up as</p>
<pre><code>for _ in range(1000):
running_loss = 0
for x, y in dataloader:
x = x.reshape(-1, 1).float()
y = y.reshape(-1, 1).float()

pred = model(x).reshape(-1, 1)
optimizer.zero_grad()
loss = F.mse_loss(pred, y)

# mu, sigma = model.mu_sigma(x)
# dist = Normal(mu, sigma)
# values = dist.log_prob(y)
# loss += torch.sum(values)

loss.backward()
optimizer.step()

running_loss += loss.item()
print(running_loss / len(dataloader))
</code></pre>
<p>when I call model.mu_sigma(x), I am getting mu = 0 and sigma = 1 for all the outputs</p>
<p>Is there something I am doing wrong? Is there some better loss function I should be doing?</p>
 
Top