I'm running a regex through some log files. The capture groups should capture some relevant fields. I'd like to know if the logfile mentions a successful ending of the job or not. This can be concluded by the presence or absence of the string "Job executed successfully"
My regex so far:
^Job started at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+orderno\s+-\s+'(\w+)'\s+runno\s+-\s+'(\d+)'[\s\S]+Host1\s'([\w.]+)'\[([\w-]+)\] username '([\w\\]+)' - Host2\s'([\w.]+)'\[([\w-]+)\] username '([\w\\]+)'[\s\S]+(Job executed successfully)?[\s\S]+Job ended at\s'(\d+\s\d+:\d+:\d+:\d+)'\s+Elapsed time\s\[([\d.]+)sec\]\sCPU usage\s\[([\d.]+)sec]
(I'm kind of new to regex, so it will not be perfect at all and needs some hardening)
A sample log with successful ending: The regex above will only work when the question mark behind "(Job executed successfully)?" is removed which should not be necessary in my opinion.
Job started at '0902 23:56:00:367' orderno - '0tzh0' runno - '00064' Number of transfers - 1
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 1 *************** Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0902 23:56:07:138' Elapsed time [7sec] CPU usage [0.15sec]
A sample log with unsuccessful ending: The regex above works like it should.
Job started at '0831 15:26:00:365' orderno - '0tuq5' runno - '00030' Number of transfers - 4
Host1 'Local'[Windows-LOCAL] username 'xxx\xxx' - Host2 'xxx.xxx.xx'[Unix-SFTP] username 'xxx'
Local host is: xxx - Windows 200x [601] Service Pack 1 build 7601 - Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
********** Starting transfer #1 out of 4 *************** Unable to connect to SSH server on 'xxx.xxx.xx': SFTP_Connect : psftp_connect failed : ssh_init: Network error: Connection timed out .
Connection to host sftp.onenet.be could not be established
Job ended at '0831 15:26:21:426'
Elapsed time [21sec] CPU usage [0.0sec]
With minimal change to your regex, you could use this one:
(Main changes indicated by
^
above)I also converted some quantifiers to lazy which should make things a little faster.
regex101 demo
Your current regex would match everything till the end due to greedy matching of
[\s\S]+
and backtrack (from right to left) and test for(Job executed successfully)?[\s\S]+
, and there,[\s\S]+
will match as soon asJob ended
gets found.In the above way, we check from left to right each character until we get to the part we need, i.e.
Job executed successfully
if it exists.