RLLIB Evaluation on a batch of observations

29 views Asked by At

We would like to evaluate out RL model on a batch of pre-saved observations (as opposed to the natural walk of the agent through the environment). We expect that the function algo.compute_actions() should allow doing that but we cannot find the right representation of the input data that works with the function. Optimally, we would provide the function with batches of observations in a form of either: PD DataFrames, NP Arrays, Python Dictionaries, Lists, …

Temporary Workaround: The issue seems to be related to the lack of preprocessors for workers for the default policy. See workaround discussion below...

Below I am pasting the minimal code that recreates the issue with data representations tested so far.

I am also listing the errors we are getting with each of the input versions:

Thx in advance for help on it

"""
! pip install ray
! pip install gymnasium
! pip install dm_tree
! pip install tensorflow
! pip install tensorflow-probability
#"""

import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig

algo = (
    PPOConfig()
    .environment("CartPole-v1")
    .framework("tf2")
    .rollouts(num_rollout_workers=0)
    .build()
)

for i in range(10):
    print(i)
    print("algo.compute_single_action() SINGLE",algo.compute_single_action({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))

#"""
# DOES NOT BREAK BUT RETURNS ONLY A SINGLE ACTION
print("algo.compute_single_action() MANY",algo.compute_single_action({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))


"""
# SpecCheckingError: input spec validation failed on TfMLPEncoder.call, Mismatch found in data element ('obs',), 
# which is a TensorSpec: Expected data type <class 'tensorflow.python.framework.tensor.Tensor'> but found NestedDict.
print("algo.compute_single_action()",algo.compute_single_action({"obs": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                                 np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
"""
#AttributeError: 'NoneType' object has no attribute 'transform'
print("algo.compute_actions()", algo.compute_actions({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]}))

#AttributeError: 'list' object has no attribute 'items'
#print("algo.compute_single_action()",algo.compute_actions([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]))
#"""

"""
print("algo.compute_actions()", algo.compute_actions({"observations": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                       np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
#"""

"""
print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()],
                                                                                           [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]])}))
#"""

TEMPORARY WORKAROUND:

In File ~\AppData\Local\anaconda3\Lib\site-packages\ray\rllib\algorithms\algorithm.py:1750

add the line "preprocessed = ob"

    policy = self.get_policy(policy_id)
    print("ST policy", policy, policy_id)

    filtered_obs, filtered_state = [], []
    for agent_id, ob in observations.items():
        worker = self.workers.local_worker()
        print("ST preprocessors", worker.preprocessors)
        
        # ST: 
        #preprocessed = worker.preprocessors[policy_id].transform(ob)
        preprocessed = ob
0

There are 0 answers