Is it possible to build a custom tf-agent environment that return more trajectories?

I need to code a custom environment in tf-agents which run a fluid-dynamics simulation an apply action to it. I coded a basic environment that return an observation, and a reward then the agent apply one action. Now I need to "upgrade" this environment. Basically I want to divide the environment into N pseudo-environment. Each pseudo-environment have a scalar observation and a scalar action which is specific to the agent, while the reward is the same from all pseudo-environment. Then I would like the environment to return N trajectories and train the same agent over these N trajectories (like in the case you are running N parallel environment).

I started by using ParallelPyEnvironment and so I basically create N parallel environment where only the one with id=0 actually run the simulation and the others just wait for the simulation to be finished. Like this it works but it is just not efficient as I create N environment that basically just wait for id=0 to finish. It would be much more efficient (and simple) if I could do it in just one environment that reads all the N observation and applies all the N actions (which are predicted by an agent that has a scalar action space and is so run on all the N observation) and automatically create N trajectories. But I did not had any idea on how I could do it and either I could not find anything on the internet. Basically the thing would be having batch_size = N but just one environment. Anyone has any idea? Thank you very much!!!

You need to sign in to view this answers

Related Post