Fluentbit S3 Configuration: How to Get Log Paths Similar to FluentD?

685 views Asked by At

I am in the process of converting FluentD with Fluent-bit to ship logs from K8S to S3. I need some help with tag_rewrite and pushing logs to the correct path in S3.

FluentD config:

 <record>
#               environment ${record["kubernetes"]["namespace_name"]}
#               pod ${record["kubernetes"]["pod_name"]}
#               podid ${record["kubernetes"]["pod_id"]}
#               container ${record["kubernetes"]["container_name"]}
#             </record>
#           </filter>
#           <match **>
#             @type s3
#             s3_bucket logs.bucket.us-east-1.domain.com
#             s3_region us-east-1
#             s3_object_key_format %Y/%m/%d/${environment}/${container}/${container}-${environment}-%Y%m%d-%H%M-${podid}-%{index}.%{file_extension}
#             store_as text

Fluentbit Config:

  [INPUT]
            Name              tail
            Tag               s3logs.*
            Path              /var/log/containers/*.log
            parser            cri
            multiline.parser  cri         
            Mem_Buf_Limit     5MB
            Skip_Long_Lines   On
            Skip_Empty_Lines  On
            Refresh_Interval  10
        [FILTER]
            Name                kubernetes
            Match               s3logs.*
            Merge_Log           On
            K8S-Logging.Parser  On
            K8S-Logging.Exclude On
            Keep_Log            Off
            Labels              Off   
            Annotations         Off         
        [FILTER]
            Name    record_modifier
            Match   s3logs.*
            Record  cluster_name ${CLUSTER}
        [FILTER]
            name    lua
            alias   set_std_keys
            match   s3logs.*
            script  /fluent-bit/scripts/s3_path.lua
            call    set_std_keys
        [FILTER]
            name rewrite_tag
            match s3logs.*
            rule $log ^.*$ s3.$namespace_name.$app_name.$container_name.$pod_id true               

      outputs: |
        [OUTPUT]
            Name s3
            Match s3logs.*
            bucket                       logs.bucket.us-east-1.domain.com
            region                       us-east-1
            s3_key_format                /%Y/%m/%d/$TAG[1]/$TAG[2]/$TAG[3]/$TAG[3]-$TAG[1]-%Y%m%d-%H%M-${podid}.txt
            store_dir                    /var/log/fluentbit-s3-buffers
            total_file_size              256MB
            upload_timeout               2m
            use_put_object               On
            preserve_data_ordering       On

S3 Lua script - based on my reading you have to use a LUA script to extract K8S metada:

function set_std_keys(tag, timestamp, record)

            -- Pull up cluster
            if (record["cluster_name"] ~= nil) then
                record["cluster_name"] = record["cluster_name"]
            else
                record["cluster_name"] = "mycluster"
            end

            if (record["kubernetes"] ~= nil) then
                kube = record["kubernetes"]

                -- Pull up namespace
                if (kube["namespace_name"] ~= nil and string.len(kube["namespace_name"]) > 0) then
                    record["namespace_name"] = kube["namespace_name"]
                else
                    record["namespace_name"] = "default"
                end

                -- Pull up container name
                if (kube["container_name"] ~= nil and string.len(kube["container_name"]) > 0) then
                    record["container_name"] = kube["container_name"]
                end

                -- Pull up pod id
                if (kube["pod_id"] ~= nil and string.len(kube["pod_id"]) > 0) then
                    record["pod_id"] = kube["pod_id"]
                end

                -- Pull up app name (Deployment, StateFuleSets, DaemonSet, Job, CronJob etc)
                if (kube["labels"] ~= nil) then
                    labels = kube["labels"]

                    if (labels["app"] ~= nil and string.len(labels["app"]) > 0) then
                        record["app_name"] = labels["app"]
                    elseif (labels["app.kubernetes.io/instance"] ~= nil and string.len(labels["app.kubernetes.io/instance"]) > 0) then
                        record["app_name"] = labels["app.kubernetes.io/instance"]
                    elseif (labels["k8s-app"] ~= nil and string.len(labels["k8s-app"]) > 0) then
                        record["app_name"] = labels["k8s-app"]
                    elseif (labels["name"] ~= nil and string.len(labels["name"]) > 0) then
                        record["app_name"] = labels["name"]
                    end
                else
                    record["app_name"] = record["app_name"]
                end
            end

          return 2, timestamp, record
        end                    

shipping logs via FluentD goes to the correct path in s3 for example: 2023/10/14/namespace/container_name/container-name-namespace_name-2023-10-14-UUID.txt

shipping logs via Fluentbit goes to the wrong path in s3 for example: 2023/10/14/var/log/containers/containers-var-20231014-0759-.log-object00N1PX3n

How can i get the logs from Fluentbit to be shipped to the correct. I'm sure it's just a configuration issue, but I've been through the Fluentbit docs, sought help on Slack, even scrolled through GitHub, but to no avail.

1

There are 1 answers

3
VonC On

If I understand correctly, you have:

[ Input ] ---> [ Kubernetes Filter ] ---> [ Lua Script Filter ] 
   ---> [ Tag Rewrite Filter ] ---> [ S3 Output ]

The issue is about constructing the correct s3_key_format and making sure the tag rewrite and Lua script are setting up the necessary fields in the logs record that are used to construct the S3 object key.
In other words, align Fluent Bit's S3 path formatting to match that of FluentD.

Try and update the s3_key_format field in the [OUTPUT] section of your Fluent Bit configuration to match the format used in FluentD. Make sure the field references are correctly mapped to the log record fields populated by your Lua script and other filters.

s3_key_format /%Y/%m/%d/${namespace_name}/${container_name}/${container_name}-${namespace_name}-%Y%m%d-%H%M-${pod_id}.txt

And verify your Lua script: see if it is correctly extracting and setting the required fields (namespace_name, container_name, and pod_id) in the log records. To check that, add some logging in the script to verify the values being set.

function set_std_keys(tag, timestamp, record)
    -- existing script...
    -- add logging to verify field values
    print("namespace_name: " .. (record["namespace_name"] or "nil"))
    print("container_name: " .. (record["container_name"] or "nil"))
    print("pod_id: " .. (record["pod_id"] or "nil"))
    return 2, timestamp, record
end

Also, make sure the rewrite_tag filter is correctly configured to generate the tags you need. Verify that the regex and replacement pattern are correct, and that the fields referenced in the pattern are being populated as expected.

[FILTER]
    name rewrite_tag
    match s3logs.*
    rule $log ^.*$ s3.$namespace_name.$app_name.$container_name.$pod_id true 

Notes:

  • In Fluent Bit, the s3_key_format_tag_delimiters option allows you to specify characters to split the tag into parts, which can then be used in the s3_key_format option. Make sure your tags are structured in a way that allows for this splitting to achieve the desired path format.

    s3_key_format_tag_delimiters .
    
  • FluentD allows for specific time formatting and tagging through its <buffer> configuration. If possible, try to replicate this setup in Fluent Bit to achieve similar path formatting.