Logstash: Reading multiline data from optional lines

1.6k views Asked by At

I have a log file which contains lines which begin with a timestamp. An uncertain number of extra lines might follow each such timestamped line:

SOMETIMESTAMP some data
extra line 1 2
extra line 3 4

The extra lines would provide supplementary information for the timestamped line. I want to extract the 1, 2, 3, and 4 and save them as variables. I can parse the extra lines into variables if I know how many of them there are. For example, if I know there are two extra lines, the grok filter below will work. But what should I do if I don't know, in advance, how many extra lines will exist? Is there some way to parse these lines one-by-one, before applying the multiline filter? That might help.

Also, even if I know I will only have 2 extra lines, is the filter below the best way to access them?

filter {
    multiline {
        pattern => "^%{SOMETIMESTAMP}"
        negate => "true"
        what => "previous"
    }

    if "multiline" in [tags] {
        grok {
            match => { "message" => "(?m)^%{SOMETIMESTAMP} %{DATA:firstline}(?<newline>[\r\n]+)%{DATA:secondline}(?<newline>[\r\n]+)%{DATA:thirdline}$" }
        }
    }
    # After this would be grok filters to process the contents of
    # 'firstline', 'secondline', and 'thirdline'. I would then remove
    # these three temporary fields from the final output.
}

(I separated the lines into separate variables since this allows me to do additional pattern matching on the contents of the lines separately, without having to refer to the entire pattern all over again. For example, based on the contents of the first line, I might want to present branching behavior for the other lines.)

2

There are 2 answers

7
Alcanzar On

Why do you need this?

Are you going to be inserting one single event with all of the values or are they really separate events that just need to share the same time stamp?

If they all need to appear in the same event, you'll like need to resort to a ruby filter to separate out the extra lines into fields on the event that you can then further work on.

For example:

if "multiline" in [tags] {
    grok {
        match => { "message" => "(?m)^%{SOMETIMESTAMP} %{DATA:firstline}(?<newline>[\r\n]+)" }
    }
    ruby {
       code => '
         event["lines"] = event["message"].scan(/[^\r\n]+[\r\n]*/);
       '
    }
}

If they are really separate events, you could use the memorize plugin for logstash 1.5 and later.

0
IMPRENABLE AUTOMATION On

This has changed over versions of ELK Direct event field references (i.e. event['field']) have been disabled in favor of using event get and set methods (e.g. event.get('field')).

filter {
    grok {
        match => { "message" => "%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:level}%{DATA:firstline}" }
    }
    ruby { code => "event.set('message', event.get('message').scan(/[^\r\n]+[\r\n]*/))" }
}