Issue with Multi-Agent Environment Observation Space in RLlib

284 views Asked by At

I'm working on a multi-agent environment with two agents (nodes), where each agent has to decide whether to transmit or not based on their observations of their own capacity and the capacity of the other agent. Only one agent should transmit, and it should be the one with the highest capacity at that moment. I'm representing each agent using dictionaries, but I seem to have a problem with the observation space that I can't resolve. My class looks like this:

class CapacityEnv(MultiAgentEnv):
    def __init__(self):
      self.action_space = Discrete(2) # 0 not transmit, 1 transmit
      self.observation_space = Dict({
            "1": Box(low=0, high=101, dtype=int), # Maximum capacity is 100 for each agent
            "2": Box(low=0, high=101, dtype=int)
        })
      self.node_capacity = {"1": 100, "2": 100}

    def step(self, action_dict):
      node_choice_1 = action_dict["1"]
      node_choice_2 = action_dict["2"]

      rewards = {"1": 0, "2": 0}  

      if node_choice_1 == node_choice_2: 
          rewards = {"1": -10, "2": -10}
      
      if node_choice_1 == 0 and node_choice_2 == 1: 
          if self.node_capacity["2"] >= self.node_capacity["1"]:
            rewards = {"1": 10, "2": 10}
          else:
            rewards = {1: -10, 2: -10}
          self.node_capacity["1"] = self.node_capacity["1"]
          self.node_capacity["2"] = self.node_capacity["2"] - 5

      elif node_choice_1 == 1 and node_choice_2 == 0:
          if self.node_capacity["1"] >= self.node_capacity["2"]:
            rewards = {"1": 10, "2": 10}
          else:
            rewards = {"1": -10, "2": -10}
          self.node_capacity["1"] = self.node_capacity["1"] - 5
          self.node_capacity["2"] = self.node_capacity["2"] 
      print(self.node_capacity)
      observations = self.node_capacity
   
      if self.node_capacity["1"] == 0 or self.node_capacity["2"] == 0:
        done = True
      else:
        done = False
      
      return observations, rewards, done, False, {}
      
    def reset(self, *, seed=None, options=None):
      self.node_capacity = {"1": 100, "2": 100}
      print(self.node_capacity)
      observations = self.node_capacity
      return observations, {}

However, when I train my agents using RLlib:

from ray.tune.logger.logger import pretty_print
config = DQNConfig().environment(CapacityEnv).training(gamma=0.9, lr=0.001, train_batch_size=512)
agent = config.build()
for i in range(2):
    result = agent.train()
    print(pretty_print(result))

, and I encounter the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-69-7e2ec9487891> in <cell line: 2>()
      1 from ray.tune.logger.logger import pretty_print
      2 for i in range(2):
----> 3     result = agent.train()
      4     print(pretty_print(result))

20 frames
/usr/local/lib/python3.10/dist-packages/tree/__init__.py in assert_same_structure(a, b, check_types)
    286     str1 = str(map_structure(lambda _: _DOT, a))
    287     str2 = str(map_structure(lambda _: _DOT, b))
--> 288     raise type(e)("%s\n"
    289                   "Entire first structure:\n%s\n"
    290                   "Entire second structure:\n%s"

ValueError: The two structures don't have the same nested structure.

First structure: type=int str=100

Second structure: type=OrderedDict str=OrderedDict([('1', 55), ('2', 94)])

More specifically: Substructure "type=OrderedDict str=OrderedDict([('1', 55), ('2', 94)])" is a sequence, while substructure "type=int str=100" is not
Entire first structure:
.
Entire second structure:
OrderedDict([('1', .), ('2', .)])

I think the issue might be related to the observation space, and I'm trying to solve it but I can't fix it, especially in the context of RLlib and MultiAgentEnv. Any guidance or insights on how to resolve this problem would be greatly appreciated. Thank you!

1

There are 1 answers

1
Rafid Abyaad On

There are a couple of things that need to be fixed in the env implementation.

  1. RLLib expects the observation space to be "for each agent". You can use a simple observation space and also set the agent ids like so:

     self.observation_space = Box(low=0, high=101, dtype=int)
     self._agent_ids = ["1","2"]
    
  2. If your observation space is a Box, RLLib expects each observation as a numpy ndarray. So you should build a dictionary that looks something like this : {"1":array[99],"2":array[2]}. Specifically, this error is thrown after reset is called and the observations do not match the space. You can see how the observation values should look like by sampling your observation space like so :

       self.node_capacity["1"] = self.observation_space.sample()
       self.node_capacity["2"] = self.observation_space.sample()
       observations = self.node_capacity
    
  3. Your dones and truncateds should also be dictionaries containing bools for each agent and "__all__". RLLib will complain because it expects a multi-agent dict.

I suggest keeping the observations and rewards as ints/numpy arrays and turning them into dictionaries before returning from reset() and step().