log analyze: finding lines by time difference

2.7k views Asked by At

I have a long log file generated with log4j, 10 threads writing to log. I am looking for log analyzer tool that could find lines where user waited for a long time (i.e where the difference between log entries for the same thread is more than a minute).

P.S I am trying to use OtrosLogViewer, but it gives filtering by certain values (for example, by thread ID), and does not compare between lines.

PPS the new version of OtrosLogViewer has a "Delta" column that calculates the difference between adj log lines (in ms)

thank you

3

There are 3 answers

0
Raffaele On BEST ANSWER

This simple Python script may be enough. For testing, I analized my local Apache log, which BTW uses the Common Log Format so you may even reuse it as-is. I simply compute the difference between two subsequent requests, and print the request line for deltas exceeding a certain threshold (1 second in my test). You may want to encapsulate the code in a function which also accepts a parameter with the thread ID, so you can filter further

#!/usr/bin/env python
import re
from datetime import datetime

THRESHOLD = 1

last = None
for line in open("/var/log/apache2/access.log"):
    # You may insert here something like
    # if not re.match(THREAD_ID, line):
    #   continue
    # Python does not support %z, hence the [:-6]
    current = datetime.strptime(
        re.search(r"\[([^]]+)]", line).group(1)[:-6],
        "%d/%b/%Y:%H:%M:%S")
    if last != None and (current - last).seconds > THRESHOLD:
        print re.search('"([^"]+)"', line).group(1)
    last = current
0
Noam Manos On

Based on @Raffaele answer, I made some fixes to work on any log file (skipping lines that doesn't begin with the requested date, e.g. Jenkins console log). In addition, added Max / Min Threshold to filter out lines base on duration limits.

#!/usr/bin/env python
import re
from datetime import datetime

MIN_THRESHOLD = 80
MAX_THRESHOLD = 100

regCompile = r"\w+\s+(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).*"
filePath = "C:/Users/user/Desktop/temp/jenkins.log"

lastTime = None
lastLine = ""

with open(filePath, 'r') as f:
    for line in f:   
        regexp = re.search(regCompile, line)
        if regexp:
            currentTime = datetime.strptime(re.search(regCompile, line).group(1), "%Y-%m-%d %H:%M:%S")

            if lastTime != None:
                duration = (currentTime - lastTime).seconds
                if duration >= MIN_THRESHOLD and duration <= MAX_THRESHOLD:
                    print ("#######################################################################################################################################")
                    print (lastLine)
                    print (line)
            lastTime = currentTime
            lastLine = line
f.closed
0
weberjn On

Apache Chainsaw has a time delta column.

enter image description here