How to add an Amazon S3 data source via REST API?

329 views Asked by At

I have CSV files in a directory of an S3 bucket. I would like to use all of the files as a single table in Dremio, I think this is possible as long as each file has the same header/columns as the others.

Do I need to first add an Amazon S3 data source using the UI or can I somehow add one as a Source using the Catalog API? (I'd prefer the latter.) The REST API documentation doesn't provide a clear example of how to do this (or I just didn't get it), and I have been unable to find how to get the "New Amazon S3 Source" configuration screen as shown in the documentation, perhaps because I've not logged in as an administrator?

For example, let's say I have a dataset split over two CSV files in an S3 bucket named examplebucket within a directory named datadir:

s3://examplebucket/datadir/part_0.csv
s3://examplebucket/datadir/part_1.csv

Do I somehow set the S3 bucket/path s3://examplebucket/datadir as a data source and then promote each of the files contained therein (part_0.csv and part_1.csv) as a Dataset? Is that sufficient to allow all the files to be used as a single table?

1

There are 1 answers

0
James Adams On

It turns out that this is only possible for admin users, normal users can't add a source. To do what I have proposed above you put the files into an S3 bucket which has already been configured as a Dremio source by an admin user. Then you promote the files or folder as a data source using the Dremio Catalog API.