I am new to databricks. I am looking for public big data dataset for my school project, then I came across AWS public dataset on this link: https://registry.opendata.aws/target/
I am using python on Databricks, and I don't know how to establish a connection to the data. I have found the following how to guide:
I am not sure how to find the respective access_key, secret_key, AWS_bucket_name and the mount_name.
This documentation is for non-public S3 buckets.
For this dataset you can simply read using the
s3://...
URL, like this:I used
text
file format just for example, but because this dataset uses XML to store the data, you'll need to use something like spark-xml library to extract necessary data.