aggregate logstash filter with "multiple pipelines"

531 views Asked by At

I would like to let httpd access_log entries be processed by two different logstash filters.

One of them is the "aggregate" filter, which is known to only work properly with a single worker thread. However, the other filter (let's call it "otherfilter") should be allowed to work with several worker threads, so that there is no loss of performance.

To accomplish this I would like to use the "multiple pipeline" feature of logstash. Basically one pipeline should read the data ("input pipeline") and distribute it to two other pipelines on which the two mentioned filters operate (let's call them "aggregate pipeline" and "otherfilter pipeline").

First tests have shown, that the results of the aggregate filter are not correct, if the input pipeline is set up to work with more than one thread. That is, when aggregating in the interval of 60 seconds an events counter sometimes shows more and sometimes less events as acutally occurred. The problem seems that events arrive "not ordered" in the aggregate filter, and thus, intervals (whose start and end are determined based on timestamp field) are incorrect.

So I ask myself whether what I want to achieve is at all feasible with "multiple pipelines"?

1

There are 1 answers

1
leandrojmp On

You can breakup a single pipeline in multiple pipelines, but since you want to use the aggregate filter you need to make sure that everything that happens before the event enters the aggregate filter is running with only one worker.

For example, if you broke up your pipeline into pipeline A, which is your input, pipeline B, which is your aggregate filter, and pipeline C, which is your other filter.

This will only work if:

  • Pipeline A is running with only one worker.
  • Pipeline B is running with only one worker.
  • Pipeline C runs after pipeline B and don't rely on the orders of the events.

If your input pipeline is running with more than one worker you can't guarantee the order of the events when they enter your aggregate pipeline, so basically your input and your aggregate should be in the same pipeline and then you can direct the output to the other filter pipeline that runs with more than one worker.