Can Glue Workflow or Trigger get parameters from EventBridge

4.5k views Asked by At

My system design

  • I have created 4 Glue Jobs: testgluejob1, testgluejob2, testgluejob3 and common-glue-job.
  • EventBridge rule detects SUCCEEDED state of glue jobs such as testgluejob1, testgluejob2, testgluejob3.
  • After getting Glue Job's SUCCEEDED notification, Glue Trigger run to start common-glue-job. enter image description here

Problem

  • I want to use the jobname string in common-glue-job script as parameter
  • Is it possible to pass parameters to Glue Workflow or Trigger from EventBridge? enter image description here

The things I tried

       Type: AWS::Glue::Trigger
       ...
         Actions:
           - JobName: prod-job2
             Arguments:
               '--job-bookmark-option': job-bookmark-enable
  • If set Run Properties for Glue Workflow, I cat get it from common-glue-job by using boto3 and get_workflow_run_properties() function. But I have no idea how to put Run Properties from EventBridge by CFn
    https://docs.aws.amazon.com/glue/latest/dg/workflow-run-properties-code.html
  • I set Target InputTransformer of EventBridge Rule, but I'm not sure how to use this value in common-glue-job.
DependsOn:
- EventBridgeGlueExecutionRole
- GlueWorkflowTest01
Type: AWS::Events::Rule
Properties:
Name: EventRuleTest01
EventPattern:
  source:
    - aws.glue
  detail-type:
    - Glue Job State Change
  detail:
    jobName:
     - !Ref GlueJobTest01
    state:
     - SUCCEEDED
Targets:
  -
    Arn: !Sub arn:aws:glue:${AWS::Region}:${AWS::AccountId}:workflow/${GlueWorkflowTest01}
    Id: GlueJobTriggersWorkflow
    RoleArn: !GetAtt 'EventBridgeGlueExecutionRole.Arn'
    InputTransformer:
      InputTemplate: >-
        {
          "--ORIGINAL_JOB": <jobName>
        }
      InputPathsMap: 
        jobName : "$.detail.jobName"

Any help would be greatly appreciated.

1

There are 1 answers

7
ChildishGirl On

If I understand you correctly, you already have information in EventBridge event, but cannot access it from your Glue job. I used the following workaround to do this:

  1. You need to get an event ID from Glue workflow properties
import sys
import boto3
from awsglue.utils import getResolvedOptions

glue_client = boto3.client('glue')
event_client = boto3.client('cloudtrail')

args = getResolvedOptions(sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
event_id = glue_client.get_workflow_run_properties(Name=args['WORKFLOW_NAME'],
                       RunId=args['WORKFLOW_RUN_ID'])['RunProperties']['aws:eventIds'][1:-1]
  1. Get all NotifyEvent events for the last several minutes. It's up to you to decide how much time can pass between the workflow start and your job start.
response = event_client.lookup_events(LookupAttributes=[{'AttributeKey': 'EventName',
                                                         'AttributeValue': 'NotifyEvent'}],
                                                         StartTime=(datetime.datetime.now() - datetime.timedelta(minutes=5)),
                                                         EndTime=datetime.datetime.now())['Events']
  1. Check which event has an enclosed event with the event ID we get from Glue workflow.
for i in range(len(response)):
      event_payload = json.loads(response[i]['CloudTrailEvent'])['requestParameters']['eventPayload']
      if event_payload['eventId'] == event_id:
                event = json.loads(event_payload['eventBody'])

In event variable you get full content of the event that triggered workflow.