I have a Pyspark code which is scheduled to run by Oozie. I need to print the total row count of the output table created by this workflow in the Succes email.
I have set this row count as an environment variable in the pyspark code.
row_count = output_table.count()
print(f"Row count of table: {row_count}")
os.environ["ROW_COUNT"] = str(row_count)
Here is the action in workflow.xml:
<action name="sendSuccessEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${successEmailTo}</to>
<subject>${clusterName}: Workflow suceeded for the date: ${nominal_date}</subject>
<body>
Hi Team,
The workflow completed successfully for the date ${nominal_date}.
Workflow details
----------------
Cluster Name: ${clusterName}
Nominal time: ${nominal_date}
End time: ${timestamp()}
Row count: ${env::ROW_COUNT}
Note: This is a auto generated email, for any further details please contact our DE team.
</body>
</email>
<ok to="end"/>
<error to="kill"/>
</action>
Can someone review this since I'm unable to achieve the needed. Also can suggest any other ways of doing it.