I'm developing an ETL. The first step is a Text File Input, that adds to the stream some metadata from "Additional output fields", including filename and last modified.
I must make a query to DB verifying if that filename with that last modified datetime has already been processed. If so, the stream must stop and next steps must not be processed.
Is that possible? I've googled about it and found no example.
Pentaho processes all steps in parallel, so this kind of linear abstraction can be a little confusing.
What you need to do is to return no rows if don't want to continue your processing. If subsequent steps receive 0 rows, they will do nothing.
There are some ways to do this: