<p>I am attempting to apply behavior cloning to actor-critic agents that operate under a continuous action space. I have some dirac delta function is denoted as p_bc(a|x) where a is the action and x is the state. I have a test neural network that represents the actor where it trys to learn some mean, u(x), and sigma, o(x), for a given x. I have tested MSE and NLL as loss functions. I wanted to test KL loss but because I could not find any documentation on dirac delta functions, I am unsure if there is a distribution module in pytorch. Basically, my question is what am I doing wrong and what are some alternatives I could do?</p>

<p>I have some test neural network in pytorch</p>

<pre><code>class DistributionFitModel(nn.Module):

def __init__(self):

super().__init__()

self.model = nn.Sequential(

nn.Linear(1, 16),

nn.Linear(16, 32),

nn.Linear(32, 16)

)

self.mu = nn.Sequential(nn.Linear(16, 1), nn.ReLU())

self.logvar = nn.Sequential(nn.Linear(16, 1), nn.ReLU())

def forward(self, x):

x = self.model(x)

mu = self.mu(x)

logvar = self.logvar(x)

std = torch.exp(0.5 * logvar)

eps = torch.randn_like(std)

return eps.mul(std).add_(mu)

def mu_sigma(self, x):

x = self.model(x)

return self.mu(x), torch.exp(0.5 * self.logvar(x))

</code></pre>

<p>I use logvar instead of var directly because that's what I've seen VAE implementations do</p>

<pre><code>keys = [(0, -1), (1, 1), (2, -1), (3, 0), (4, 1), (5, 0)]

ys = []

xs = []

L = 1000

for x_mul, y_mul in keys:

ys += (np.ones(1000) * y_mul).tolist()

xs += (np.ones(1000) * x_mul).tolist()

</code></pre>

<p>I generate 1000 samples of a given x and corresponding action (ie x = 0, a = -1 for first key)</p>

<p>I have some dataset/dataloader to load in the generated data</p>

<pre><code>class DistDataset(Dataset):

def __init__(self, x, y):

super().__init__()

self.x = x

self.y = y

def __len__(self):

return len(self.x)

def __getitem__(self, item):

return torch.tensor(self.x[item]), torch.tensor(self.y[item])

dataset = DistDataset(xs, ys)

dataloader = DataLoader(dataset, batch_size=512, shuffle=True)

</code></pre>

<p>I have the training loop set up as</p>

<pre><code>for _ in range(1000):

running_loss = 0

for x, y in dataloader:

x = x.reshape(-1, 1).float()

y = y.reshape(-1, 1).float()

pred = model(x).reshape(-1, 1)

optimizer.zero_grad()

loss = F.mse_loss(pred, y)

# mu, sigma = model.mu_sigma(x)

# dist = Normal(mu, sigma)

# values = dist.log_prob

# loss += torch.sum(values)

loss.backward()

optimizer.step()

running_loss += loss.item()

print(running_loss / len(dataloader))

</code></pre>

<p>when I call model.mu_sigma(x), I am getting mu = 0 and sigma = 1 for all the outputs</p>

<p>Is there something I am doing wrong? Is there some better loss function I should be doing?</p>