As per our AWS environment , we have 2 different types SAGs( service account Group) for Data storage. One SAG is for generic storage , another SAG for secure data which will only hold PII or restricted data. In our environment, we are planning to deploy Glue . In that case , Would we have one metastore over both secure and non-secure? If we needed two meta stores, how would this work with Databricks? If one metastore, how to handle the secure datas ? Please help us to more details on this in .
AWS glue: Deploy model in aws environment
248 views Asked by Karthikeyan Rasipalay Durairaj AtThere are 2 answers
shuraosipov
On
In AWS Glue, each AWS account has one persistent metadata store per region (called Glue Data catalog). It contains database definitions, table definitions, job definitions, and other control information to manage your AWS Glue environment. You manage permissions to that objects using IAM (e.g., who can make GetTable or GetDatabase API calls to that objects).
In addition to AWS Glue permissions, you would also need to configure permissions to the data itself (e.g., who can make GetObject API call to the data stored on S3).
So, answering your questions. Yes, you would have a single data catalog. However, depending on your security requirements, you would be able to define resource-based and role-based permissions on metadata and content.
You can find a detailed overview here - https://aws.amazon.com/blogs/big-data/restrict-access-to-your-aws-glue-data-catalog-with-resource-level-iam-permissions-and-resource-based-policies
Related Questions in METADATA
- How to solve the issue faced during running command pip install google-colab?
- kid3 - Import Album Art along with other tags from Discogs
- How do I use class in SQLAlchemy. I have being trying to use class but the table is not creating
- How can I get the currently playing media metadata in rust
- Display custom field metadata encoded multidimentional array value in WordPress
- og:image not found when share on linkedin
- Can Powershell independently extract LYRICS audio metadata from multiple flac files to multiple text files?
- In which format should ISRC IDs be included in an Ogg comment field?
- Externally hosted private app in play store error
- Set picklist Value as default value in a field on sales a engagement Runtime Object
- Azure Search blob metadata split multiple values and map to indexer as seperate values
- Hide empty variation cutom fields displayed in WooCommerce
- How to edit the Samsung Trailer Tag "Timestamp"
- I am creating a metadata scrubber web app can anyone help me how can I remove metadata from images
- Meta Box Oxygen Builder Query Custom Post Type based on current date
Related Questions in DATABRICKS
- Generate Databricks personal access token using REST API
- Databricks Delta table / Compute job
- Problem to add service principal permissions with terraform
- Spark connectors from Azure Databricks to Snowflake using AzureAD login
- SparkException: Task failed while writing rows, caused by Futures timed out
- databricks-connect==14.3 does not recognize cluster
- Connect and track mlflow runs on databricks
- Databricks can't find a csv file inside a wheel I installed when running from a Databricks Notebook
- How to override a is_member() in-built function in databricks
- Last SPARK Task taking forever to complete
- Call Databricks API from an ASP.NET Core web application
- Access df_loaded and/or run_id in Load Data section of best trial notebook of Databricks AutoML run
- How to avoid being struct column name written to the json file?
- Understanding least common type in databricks
- Azure DataBricks - Looking to query "workflows" related logs in Log Analytics (ie Name, CreatedBy, RecentRuns, Status, StartTime, Job)
Related Questions in AWS-GLUE
- AWS GLUE child node execution order of same level
- Is there a way to import Redshift Connection in PySpark AWS Glue Job?
- Retrieving a list of all failed Glue jobs via CLI
- How do I change the data type in a Glue Crawler?
- Loading around 50gb of parquet data to Redshift taking indefinite time to load
- Glue Notebook not starting: Failed to start notebook
- old aws-glue libraries in the Glue streaming ETL job 4.0?
- Add File name column to Dynamic Frame
- How to test Glue jobs and Athena queries locally on dummy data?
- AWS Glue throws AWSBadRequestException when loading DynamicFrame from s3 with local Glue docker
- AWS Glue Insert and update into oracle table
- SQL query to extract incremental data from a table in SQL Server
- redshift spectrum type conversion from String to Varchar
- Apply transformation on nested json column in dataframe
- Access Denied while creating crawler
Related Questions in AWS-GLUE-SPARK
- AWS Glue: How to filter out data from DynamicFrame when date format is wrong or bad data
- How to set AWS Glue proxy settings
- Transfering the latest data from Redshift to dynamoDB by AWS Glue
- Aws Glue job output many small files
- Failed to start Glue Notebook server
- Convert pyspark script to awsglue script
- AWS Glue - fixed width text file - with header and footer
- Cross-Region AWS Glue Data Catalog access with Glue ETL
- how to convert spark datframe to pandas dataframe in AWS Glue
- SQL Server bcp tool on AWS GLUE job
- How to trigger a Glue job from another Glue job
- AWS Glue error - Invalid input provided while running python shell program
- Data load from Arena (DMS) to AWS S3
- Reading Spark Dataframe from Partitioned Parquet data
- How to catch an exception thrown from imported module in pyspark
Related Questions in AWS-DATABRICKS
- Generate Databricks personal access token using REST API
- Connect and track mlflow runs on databricks
- How to override a is_member() in-built function in databricks
- Databricks pyspark: notebook runs fine when clicking "run all" button, but gets pyspark AttributeError when running as scheduled job
- How to customize the code which servers datarbicks models
- Assign Partition to Executor on Read
- Databricks - aws / creation of metastore using terraform
- Extract key value pair in Spark SQL where the key is a URL and value is a string
- The expense associated with DataBricks remains high despite opting for more budget-friendly backend instances(AWS)
- How to forward fill based on previous 10 days (not last 10 rows ) if any value available in previous 10 days in pySpark?
- Pyspark very slow in loop with updating same dataframe again and again
- Databricks autocomplete symbol meaning
- Databricks notebook how to stop truncating numbers when export the query result to csv
- how to list and run databricks notebooks using odbc driver in python
- Databricks: cannot create mws credentials: invalid Databricks Account configuration
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
To integrate your metastore with Databricks for (1), you will have to create two Glue Catalog instance profiles with resource level access. One instance profile will have access to generic database and tables while the other will have access to the secure databases and tables.
To integrate your metastores with Databricks for (2), you will simply create two Glue Catalog instance profiles with access to the respective metastore.
It is recommended to go with the second option as it will save you guys a lot of maintenance cost and human errors on longer run. More details on Glue Catalog and Databricks integration.
Edit: Based on the discussion in comments, if we have to access both datasets inside the same Databricks Runtime, option 2 won't work. Option 1 can be used with 2 permission sets. First only for generic data and second for both generic and secure data.