I'm trying to implement Unit Tests using Pytest, Moto (4.1.6) and s3fs (0.4.2) for my functions that interact with S3.
So far I am able to create a bucket and populate it with all the files that live in the data folder.
Unfortunately one of my requirements is that I need to access the bucket with the s3fs.core.S3FileSystem object class since that's how our internal library works and I'm trying to stay as close as possible to the original environment.
That wouldn't be a problem if I didn't get access denied when I try to access the fake bucket.
Here's the relevant code from conftest.py
#!/usr/bin/env python3
from moto import mock_s3
from pathlib import Path
import boto3
import os
import pytest
import s3fs
@pytest.fixture(scope="session")
def test_data_folder():
return os.path.join(os.path.dirname(__file__), "data")
@pytest.fixture(scope="session")
@mock_s3
def s3_filesystem(test_data_folder):
connection = boto3.client("s3", region_name="us-east-1")
connection.create_bucket(Bucket="bucket")
for path in Path(test_data_folder).rglob("*"):
if path.is_file():
with open(path, "rb") as parquet:
data = parquet.read()
connection.put_object(Bucket="bucket", Key=str(path), Body=data)
bucket = boto3.resource("s3").Bucket("bucket")
for object in bucket.objects.all():
print(object.key)
filesystem = s3fs.S3FileSystem(anon=True)
filesystem.ls(test_data_folder)
return filesystem
After this code runs I can see in the output of my print that several files that look like this exist in there:
/Users/campos/repos/project/tests/data/20221027/transactions_test_20221027.parquet
I want to return the s3fs.core.S3FileSystem object to my tests, but when I try to run filesystem.ls(test_data_folder) in my debugger I get *** PermissionError: All access to this object has been disabled
Going a little deeper, the objects returned in from objects.bucket.all() look like this:
s3.ObjectSummary(bucket_name='bucket', key='/Users/campos/repos/project/tests/data/20221027/orders_test_20221027.parquet')
I already tried adding a public access control list to the bucket creation like this s3.create_bucket(Bucket="bucket", ACL="public-read") but it didn't change anything.
I also saw this in the error messages:
api_params = {'Bucket': 'Users', 'Delimiter': '/', 'EncodingType': 'url', 'Prefix': 'ykb595/repos/gfctr-card-lob-pyexetl/tests/data/'}
...
botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
It's very clear that my files exist somewhere in some sort of bucket, but something it looks like something is not being able to find the bucket.
What am I missing?
Thank you in advance!!
Change this
filesystem.ls(test_data_folder)tofilesystem.ls("bucket")The
test_data_folderresolves tohome/username/..(for me), which means that S3FS tries to find a bucket calledhome. But the bucket-name that you're using everywhere else isbucket.Currently, the bucket structure created in S3/Moto looks like this:
/->home/->folder1/->folder2/-> etcI don't think S3FS likes this very much, either the structure itself or the fact that the first folder is named
/. When removing themock_s3decorator and testing it against AWS itself, the call tofilesystem.ls(bucket_name)fails horribly andfilesystem.walk()returns the same strange result. ([('bucket', [], [''])])When I create a flat structure in S3, the
filesystem.walk()does work as expected (both against AWS and against Moto):results in:
On a general note, you asked for
to better understand, study and implement testing with AWS following best practices.My recommendation is always to verify the behaviour against AWS first if you're unsure about the expected result. Open-source tools like S3FS and Moto are great to write unit tests, and verify that the known behaviour doesn't change. But if you don't know the expected result, you can never be sure whether the strange things that you're seeing are the result of S3FS, Moto, or AWS.