I’m implementing imitation learning using the DAgger algorithm from the imitation library in Python. The environment I’m working with is a custom Gym environment that simulates a shallow lake management problem. The expert policy is generated from an optimization process, and I’m trying to train a learner policy using the DAgger framework.
I’m encountering the following error when attempting to extend and update the DAgger trainer with new demonstrations:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 51 has 1 dimension(s)
The error occurs in the rollout.py file within the imitation library, specifically in the flatten_trajectories function when trying to concatenate reward arrays:
File "/imitation/data/types.py", line 224, in concatenate_maybe_dictobs return np.concatenate(arrs)
Steps Taken to Debug:
-
I’ve already tried reshaping the reward array within my environment to ensure that it is always a 1D array.
- In my
step()
function, I flatten the rewards as follows:
rewards = np.array(rewards, dtype=np.float32).reshape(batch_size, )
I confirmed that within my environment, the rewards are consistently shaped as
(1,)
. - In my
-
I’ve added debugging statements in
rollout.py
to track the reward shapes. Interestingly, all rewards seem to have the shape(1, 1)
before being replaced, even though I expect them to be(1,)
. -
I was initially using a
RolloutInfoWrapper
in my environment’s vectorized wrapper:lake_env = DummyVecEnv([lambda: RolloutInfoWrapper(FlattenObservation(gym.make("LakeEnv-v1"))) for _ in range(n_envs)])
After removing
RolloutInfoWrapper
, I still encountered the same issue. -
Here’s my environment setup:
- The action space is continuous:
spaces.Box(low=np.array([0.01]), high=np.array([0.1]), dtype=np.float32)
- The observation space is a single scalar:
spaces.Box(low=0.0, high=2.0, shape=(1,), dtype=np.float32)
- The reward is calculated based on the imitation accuracy of the learner relative to the expert.
- The action space is continuous:
Despite these adjustments, I continue to encounter the dimension mismatch in the flatten_trajectories
function during the rollout phase.
Relevant Code:
Here’s a snippet of my environment’s step()
function for context:
def step(self, learner_action):
if len(learner_action.shape) == 2: # if batched, shape should be (batch_size, 1)
batch_size = learner_action.shape[0]
else:
batch_size = 1
lake_states = np.repeat(self.lake_state, batch_size)
learner_action = np.clip(learner_action, 0.01, 0.1)
next_lake_states = []
rewards = []
terminated = []
for i in range(batch_size):
lake_state = lake_states[i]
action = learner_action[i]
# Lake dynamics and reward calculation
P_recycling = ((lake_state ** self.q) / (1 + lake_state ** self.q))
natural_inflow = np.random.lognormal(mean=mu, sigma=sigma)
next_lake_state = lake_state * (1 - self.b) + P_recycling + action + natural_inflow
next_lake_state = np.clip(next_lake_state, self.observation_space.low[0], self.observation_space.high[0])
expert_action = self.expert_policy.predict([lake_state])[0]
reward = float(-abs(action - expert_action)) # Reward based on imitation accuracy
next_lake_states.append(next_lake_state)
rewards.append(reward)
terminated.append(self.year >= self.n_years)
obs = np.array(next_lake_states, dtype=np.float32).reshape(batch_size, 1)
rewards = np.array(rewards, dtype=np.float32).reshape(batch_size,) # Reshaping to 1D
return obs, rewards, terminated, False, {}
- What could be causing the reward arrays to still have inconsistent shapes during the
flatten_trajectories
function? - How can I ensure that all rewards are consistently 1D arrays throughout the rollout process and avoid this dimension mismatch?
- Is there something I’m missing in my environment’s step function or the way I’m handling the rewards?
You need to sign in to view this answers