I have Glue DBs(db1 and db2) and tables(tbl1 and tbl2) available in different AWS regions(eu-west-1 and us-east-1) respectively.
My glue job in eu-west-1, needs data from both the tables, just a simple select * from db1.tbl1
and select * from db2.tbl2
. Data is stored in AWS S3 as parquet and am able to query via Athena too.
How can I retrieve that data via spark sql in glue job. Can you help me out with an example? If not spark sql can you please suggest a different approach?
Thanks very much!
Create a crawler in EU region to read data from US region S3 bucket, this would create a table in EU DB(S3 location points to US S3 bucket). That way the data is in US region but your glue job in EU can retrieve US data as required.