Google Earth Engine : Exporting MODIS images from GEE to AWS S3 bucket

2.1k views Asked by At

I am currently working on a Machine Learning project which uses MODIS dataset. As my PC doesn't meet the computational requirements of the project, I had taken an AWS server. Now the problem is that the Earth Engine is exporting images to Google Drive or Google Cloud Storage but I want them to get exported to my S3 bucket.

I have come across answers suggesting to download the data to local storage and then to upload them to S3 bucket. Given huge datasets and my poor data speed, it would take me ages to do so. Hence I want to export them to my S3 bucket directly using Earth Engine.

I have gone through the documentation where exporting happens (ee.batch.Export.image). I am thinking of writing a function that exports Geotiff images to AWS S3 bucket instead of Google Drive or Cloud Storage.

P.S.

  • I have already verified with the Amazon MODIS Public Datasets and the datasets I want (MOD09A1 and a few others) aren't offered by Amazon.
  • I have Windows 10 installed on my PC.
1

There are 1 answers

2
Sam On

MODIS imagery is already on AWS S3 (https://aws.amazon.com/public-datasets/modis/)

However, it is an interesting question for any other data set and here are a few things to consider

1) For now Google Earth Engine can only write onto Google Cloud Storage Buckets (GCS) which is free upto 5 GB or Google Drive which also has the limit of 15 GB including your gmail. So for you to be able to download these images to your local drive before pushing to AWS S3 you would need to make sure that you have enough space available on either GCS or Drive.

2) Google Earth Engine does not export metadata and will split a large GeoTiff if it exceeds certain file size limits(Certain things to keep in mind) incase you want to mosaic the split up images to a single image before uploading to AWS. You can also export properties as metadata as a CSV or kml file.

3) Once you know you have enough buffer space in GCS or Google Drive the ideal method would be

  • Push from EE to Drive/GCS
  • Pull from Drive/GCS to local and then push to AWS (If you wanted to do this using Google's Network Speed instead of client resource you can spin up a small micro instance which is under the always free model from Google)
  • Another way without using your client resource is using a web integration service, for example, Zapier can help you link Drive to AWS as new files come in they get copied to AWS and you can use the new file coming in as trigger). [I have not tried this but I know it can be done using Zapier or IFTTT.

  • Periodically check your cloud storage and delete as files get copied onto GCS you can basically have check if files or folders are synced and then delete your files on GCS/Drive to free up space again and repeat the process.

I am sure others might have some other great suggestions for this but this is just my way of doing it. Hope that helps

Sam