Pandas 1.4.2 gives errors for installing s3fs while reading csv from S3 bucket

1.7k views Asked by At

I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3.

I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore versions.

Boto3 - 1.20.32 Botocore - 1.23.32

And, here are the pandas and s3fs versions installed while deploying the packaged zip file.

pandas 1.4.2 s3fs - 22.3.0

And, with pandas latest release version, pandas.read_csv throws error "ImportError: Install s3fs access S3".

If I fix the pandas and s3fs versions as below, it fixes the error and lambda hander function is able to read csv without any error.

pandas 1.4.1 s3fs - 22.2.0

Can anyone has experienced the same error? Or, Can anyone please share what is the best practice to identify the compatible versions for each dependency package version?

I am not sharing the errors details. Please let me know if you need more details about this issue.

Thanks!

1

There are 1 answers

0
Daniel Weigel On

I have experienced this issue when attempting to write a dataframe from lambda into csv in S3:

df.to_csv('s3://mys3bucket/dummy.csv', index=True) 

We basically need s3fs which stands for 'S3 file system' to handle interactions with S3 bucket files when playing with pandas.

As for pandas, it is a known issue - as it's not always easy to find compatible versions with lambda - but the easiest is to add a AWS layer to your lambda function called: AWSDataWrangler.

Alternatively, you can create your own layer by zipping the wheels for pytz ( https://pypi.org/project/pytz/#files) and pandas (https://pypi.org/project/pandas/1.0.3/#files) in a single .zip folder that you upload to the custom layer - and then attach the layer to your lambda function.