Best practices to handle errors in NIFI

3.7k views Asked by At

I'm using NIFI, and i have data flows where I use the following processos :

  • ExecuteScript
  • RouteOnAttribute
  • FetchMapDistribuedCache
  • InvokeHTTPRequest
  • EvaluateJSONPath

and two level process group like NIFI FLOW >>> Process group 1 >>> Process group 2, my question is how to handle errors in this case, I have created output port for each processor to output errors outside the process group and in the NIFI Flow I have done a funnel for each error type and then put all those errors catched in Hbase so i can do some reporting later on, and as you can imagine this add multiples relationships and my simple dataflow start to became less visible.

My questions are, what's the best practices to handle errors in processors, and what's the best approach to do some error reporting using NIFI ( Email or PDF )

1

There are 1 answers

1
Andy On

It depends on the errors you routinely encounter. Some processors may fail to perform a task (an expected but not desired outcome), and route the failed flowfile to REL_FAILURE, a specific relationship which can be connected to a processor to handle these failures, or back to the same processor to be retried. Others (or the same processors in different scenarios) may encounter exceptions, which are unexpected occurrences which cannot be resolved by the processor.

An example of this is PutKafka vs. EncryptContent. If the remote Kafka system is temporarily unavailable, the processor would fail to send the flowfile content. However, retrying after some delay period could be successful if the remote system is once again available. However, decrypting cipher text with the wrong key will always throw an exception, no matter how many times it is attempted or how long the retry delay is.

Many users route the errors to PutEmail processor and report them to a specific user/group who can evaluate the errors and monitor the data flow if necessary. You can also use "Reporting Tasks" to monitor metrics or ingest provenance data as operational data and route that to email/offline storage, etc. to run analytics on it.