Can fluent-bit parse multiple types of log lines from one file?

13.5k views Asked by At

I have a fairly simple Apache deployment in k8s using fluent-bit v1.5 as the log forwarder. My setup is nearly identical to the one in the repo below. I'm running AWS EKS and outputting the logs to AWS ElasticSearch Service.

https://github.com/fluent/fluent-bit-kubernetes-logging

The ConfigMap is here: https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml

The Apache access (-> /dev/stdout) and error (-> /dev/stderr) log lines are both in the same container logfile on the node. The problem I'm having is that fluent-bit doesn't seem to autodetect which Parser to use, I'm not sure if it's supposed to, and we can only specify one parser in the deployment's annotation section, I've specified apache. So in the end, the error log lines, which are written to the same file but come from stderr, are not parsed. Should I be sending the logs from fluent-bit to fluentd to handle the error files, assuming fluentd can handle this, or should I somehow pump only the error lines back into fluent-bit, for parsing?

Am I missing something?

Thanks!

3

There are 3 answers

0
chakatz On BEST ANSWER

I was able to apply a second (and third) parser to the logs by using the FluentBit FILTER with the 'parser' plugin (Name), like below.

Documented here: https://docs.fluentbit.io/manual/pipeline/filters/parser

[FILTER]
    Name            parser
    Match           kube.*
    Parser          apache_error_custom
    Parser          apache_error
    Preserve_Key    On
    Reserve_Data    On
    Key_Name        log
0
Max Lobur On

Didn't see this for FluentBit, but for Fluentd:

Note format none as the last option means to keep log line as is, e.g. plaintext, if nothing else worked.

You can also use FluentBit as a pure log collector, and then have a separate Deployment with Fluentd that receives the stream from FluentBit, parses, and does all the outputs. Use type forward in FluentBit output in this case, source @type forward in Fluentd. Docs: https://docs.fluentbit.io/manual/pipeline/outputs/forward

2
Dean Meehan On

Fluentbit is able to run multiple parsers on input.

If you add multiple parsers to your Parser filter as newlines (for non-multiline parsing as multiline supports comma seperated) eg.

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Parser json
    Key_Name log

The 1st parser parse_common_fields will attempt to parse the log, and only if it fails will the 2nd parser json attempt to parse these logs.

If you want to parse a log, and then parse it again for example only part of your log is JSON. Then you'll want to add 2 parsers after each other like:

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Key_Name log

[Filter]
    Name Parser
    Match *
    Parser json
    # This is the key from the parse_common_fields regex that we expect there to be JSON
    Key_Name log

Here is an example you can run to test this out:

Example

Attempting to parse a log but some of the log can be JSON and other times not.

Example log lines

2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO  SomeService:000 - Using decorator records threshold: 0
2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO  CompleteOperationLogger:25 - {"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}

parser.conf

[PARSER]
    Name   parse_common_fields
    Format regex
    Regex ^(?<timestamp>[^ ]+)\..+ \[(?<log_type>[^ \[\]]+)\] \[(?<transaction_id>[^ \[\]]*)\]\[(?<transaction_id2>[^ \[\]]*)\] (?<level>[^ ]*)\s+(?<service_id>[^ ]+) - (?<log>.+)$
    Time_Format %Y-%m-%dT%H:%M:%S
    Time_Key    timestamp

[PARSER]
    Name   json
    Format json

fluentbit.conf

[SERVICE]
    Flush     1
    Log_Level info
    Parsers_File parser.conf

[INPUT]
    NAME   dummy
    Dummy  {"log": "2022-07-28T22:03:44.585+0000 [http-nio-8080-exec-3] [2a166faa-dbba-4210-a328-774861e3fdef][0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f] INFO  AnonymityService:245 - Using decorator records threshold: 0"}
    Tag    testing.deanm.non-json

[INPUT]
    NAME   dummy
    Dummy  {"log": "2022-07-29T11:36:59.236+0000 [http-nio-8080-exec-3] [][] INFO  CompleteOperationLogger:25 - {\"action\":\"Complete\",\"operation\":\"healthcheck\",\"result\":{\"outcome\":\"Succeeded\"},\"metrics\":{\"delayBeforeExecution\":0,\"duration\":0},\"user\":{},\"tracking\":{}}"}
    Tag    testing.deanm.json

[Filter]
    Name Parser
    Match *
    Parser parse_common_fields
    Key_Name log

[Filter]
    Name Parser
    Match *
    Parser json
    Key_Name log

[OUTPUT]
    Name  stdout
    Match *

Results

After the parse_common_fields filter runs on the log lines, it successfully parses the common fields and either will have log being a string or an escaped json string

First Pass

[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"log_type"=>"http-nio-8080-exec-3", "level"=>"INFO", "service_id"=>"CompleteOperationLogger:25", "log"=>"{"action":"Complete","operation":"healthcheck","result":{"outcome":"Succeeded"},"metrics":{"delayBeforeExecution":0,"duration":0},"user":{},"tracking":{}}"}]

Once the Filter json parses the logs, we successfully have the JSON also parsed correctly

Second Pass

[0] testing.deanm.non-json: [1659045824.000000000, {"log_type"=>"http-nio-8080-exec-3", "transaction_id"=>"2a166faa-dbba-4210-a328-774861e3fdef", "transaction_id2"=>"0ed32f19-47bb-4c1f-92c2-c9b7c43aa91f", "level"=>"INFO", "service_id"=>"AnonymityService:245", "log"=>"Using decorator records threshold: 0"}]
[0] testing.deanm.json: [1659094619.000000000, {"action"=>"Complete", "operation"=>"healthcheck", "result"=>{"outcome"=>"Succeeded"}, "metrics"=>{"delayBeforeExecution"=>0, "duration"=>0}, "user"=>{}, "tracking"=>{}}]

Note: The difference between result1 and result2 above is after the first pass the json string is still within the log object while the 2nd pass parses the json into it's own keys eg.:

Pass1:

[1659094619.000000000, {"log"=>"{"action": {"Complete", ...

Pass2:

[1659094619.000000000, {"action"=>"Complete", ...