We have added DLQ as a knative function in our pipeline. Before adding DLQ our workflow was
lambda1 (writing to SQS) -> Lambda2( invoked by SQS, writing to S3 and does nothing for failed scenario)
New workflow lambda1 (writing to SQS) -> Lambda2( invoked by SQS, writing to S3) -> DLQ (retrying for failed messages and sending them to s3 if still fails)
This DLQ is not in AWS, we have added this in K8S by using knative functions.
Problem is: Sometimes a few messages (success scenario) from SQS, are not hitting lambda2 and directly going to DLQ. It’s a kind of blinker, as sometimes working and other time not. And even when not working, failed messages are not the same. Keeps on changing
Before adding this DLQ, everything was working fine.
Lambda timeout: 15 mins
SQS visibility timeoutL 15 mins
Batch Size: 10
Report batch item failure : yes
Is there any way to debug this? As it is not triggering lambda code, we are helpless.
Is there any way to debug this? As it is not triggering lambda code, we are helpless.