I am experiecing issue with pandas latest release 1.4.2 while reading csv file from S3.
I am using AWS Lambda python runtime environment using python 3.8, that comes with following boto3 and botocore versions.
Boto3 - 1.20.32 Botocore - 1.23.32
And, here are the pandas and s3fs versions installed while deploying the packaged zip file.
pandas 1.4.2 s3fs - 22.3.0
And, with pandas latest release version, pandas.read_csv throws error "ImportError: Install s3fs access S3".
If I fix the pandas and s3fs versions as below, it fixes the error and lambda hander function is able to read csv without any error.
pandas 1.4.1 s3fs - 22.2.0
Can anyone has experienced the same error? Or, Can anyone please share what is the best practice to identify the compatible versions for each dependency package version?
I am not sharing the errors details. Please let me know if you need more details about this issue.
Thanks!
I have experienced this issue when attempting to write a dataframe from lambda into csv in S3:
We basically need s3fs which stands for 'S3 file system' to handle interactions with S3 bucket files when playing with pandas.
As for pandas, it is a known issue - as it's not always easy to find compatible versions with lambda - but the easiest is to add a AWS layer to your lambda function called: AWSDataWrangler.
Alternatively, you can create your own layer by zipping the wheels for pytz ( https://pypi.org/project/pytz/#files) and pandas (https://pypi.org/project/pandas/1.0.3/#files) in a single .zip folder that you upload to the custom layer - and then attach the layer to your lambda function.