Make a DB INSERT based on Text File Input metadata

460 views Asked by At

I'm developing an ETL and must do some routines for monitoring it.

At the begining, I must make in INSERT on DB to create a record informing the filename and starting process datetime. This query will return the record's PK and it must be stored. When the ETL of that file finishes, I must update that record informing the ETL finished with success and its ending process datetime.

I use Text File Input to look for files that match its regex, and add its "Additional output fields" to stream. But I can't find a component that will run only for first record and will execute a SQL command for the INSERT.

3

There are 3 answers

2
jfneis On BEST ANSWER

You can use "Identify last row" and "Filter rows" together, so you will keep only one line from your input (filtering just the last one). You INSERT will be right after the Filter Rows step.

enter image description here

As you will need to split your flow, you'll need to join your ID column with the original text input rows.

1
AlainD On

You also have a Unique row. If you do not specify on which field to filter a unique value, it will output one and exactly one row.

Now, unless I misunderstood your specs, I'd rather use Kettle's logging system. Click anywhere, select properties on the popup, then Logging tab. It will give you the status (Started/End/Stop/...) and plenty of additional info, like the number of errors, the line read and written (just tell the PDI on which step it has to look for these numbers).

You can even read almost real-time in the DB the same information as you see on the bottom panel of the PDI. Just click the fields you want and press the SQL button to create the file.

Just note that, for historical reasons, the start date is not really the start dte (it's the date of the previous successful run). The start date is called Replay date.

And also if you need this system to monitor the load and know if the run has to start or nor not, take care that on abrupt ending the system does sometimes not have the time to write "End" to the log. Therefore a logdate<now-10minutes is more reliable.

enter image description here

0
matthiash On

To do something for only the first row of a stream, use an 'Add sequence' step (start at 1) followed by a 'Filter rows' step with condition 'seq = 1'.