Error in Ray: "ModuleNotFoundError: No module named 'pandas' "

3.1k views Asked by At

I started ray on a terminal in an environment called p_c which has pandas installed with the command ray start --head --num-cpus=2 --num-gpus=0

Then, I ran the following python script:

import ray
import os
import pandas as pd
import sys

ray.init(address='auto', redis_password='5241590000000000')

@ray.remote
def foo():
    import pandas as pd
    print("This runs on the VM")
    print(os.getcwd())
    print(sys.path)
    data = pd.read_csv('/Documents/sample.data')
    
    return 1

print("This runs locally")
print(ray.get(foo.remote()))

Running this raised the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
    I1014 13:56:23.410329 16563 16563 global_state_accessor.cc:25] Redis server address = 192.168.29.24:6379, is test flag = 0
    I1014 13:56:23.411886 16563 16563 redis_client.cc:146] RedisClient connected.
    I1014 13:56:23.421353 16563 16563 redis_gcs_client.cc:89] RedisGcsClient Connected.
    I1014 13:56:23.423465 16563 16563 service_based_gcs_client.cc:193] Reconnected to GCS server: 192.168.29.24:37125
    I1014 13:56:23.424247 16563 16563 service_based_accessor.cc:92] Reestablishing subscription for job info.
    I1014 13:56:23.424291 16563 16563 service_based_accessor.cc:422] Reestablishing subscription for actor info.
    I1014 13:56:23.424387 16563 16563 service_based_accessor.cc:797] Reestablishing subscription for node info.
    I1014 13:56:23.424415 16563 16563 service_based_accessor.cc:1073] Reestablishing subscription for task info.
    I1014 13:56:23.424441 16563 16563 service_based_accessor.cc:1248] Reestablishing subscription for object locations.
    I1014 13:56:23.424466 16563 16563 service_based_accessor.cc:1368] Reestablishing subscription for worker failures.
    I1014 13:56:23.424504 16563 16563 service_based_gcs_client.cc:86] ServiceBasedGcsClient Connected.
    This runs locally
    Traceback (most recent call last):
      File "hello1.py", line 26, in <module>
        print(ray.get(foo.remote()))
      File "/home/jatin/.local/lib/python3.8/site-packages/ray/worker.py", line 1538, in get
        raise value.as_instanceof_cause()
    ray.exceptions.RayTaskError(ModuleNotFoundError): ray::__main__.foo() (pid=16182, ip=192.168.29.24)
      File "python/ray/_raylet.pyx", line 479, in ray._raylet.execute_task
      File "hello1.py", line 17, in foo
        import pandas as pd
    ModuleNotFoundError: No module named 'pandas'

I have pandas installed at all the possible paths. I am unable to understand where exactly is the worker looking for pandas module that it is not finding it. Without the pandas import the code is running fine.

1

There are 1 answers

1
pgzmnk On

The Ray runtime will look for Pandas in the configured virtual environment. If launching Ray locally ensure to install required Python libraries in the virtual environment serving the Ray runtime.

e.g.

. .venv/bin/activate
pip install pandas
ray start --num-cpus=8 --object-store-memory=7000000000 --head