Logstash in k8s - parsing nested json from MongoDB and get every nested json as separated field

37 views Asked by At

I'm using Logstash to take documents from specific MongoDB collection, and save it to Elasticsearch. Nested fields are being saved to "log_entry" as one JSON, starting with "BSON" or "ID", depends on manipulations I do using filter.

Here is example of the "log_entry":

"log_entry": {\"_id\": \"122ghgh1111, \"msg_body\": {\"text_one\": 2, \"text_data\": [{\"position\": 1}, {....}, {...}]}}

there is a lot of text in log_entry, so i don't post everything, just the structure.

Below is my config (ive tried different ways , so I'll share it all. None of them isn't doing what I'd like to achieve):

logstashPipeline:
   logstash.conf:
      input {
         mongodb {
            uri => 'mongodb://user:password@host:port/<db_name>?directConnection=true'
            placeholder_db_dir => '/opt/logstash-mongodb'
            placeholder_db_name => 'logstash_sqlite.db'
            collection => 'my_collection'
         }
      }
      // First try - still saving nested JSON as one

      filter {
         mutate {
            gsub => [ "log_entry", "=>", ": "]
            rename => { "_id" => "mongo_id" }
            remove_filed => ["_id"]
         }
         mutate {
            gsub => [ "log_entry", "BSON::ObjectID\('([0-9a-z]+'\)", '"\1']
            rename => { "_id" => "mongo_id" }
         }
      }
      // Second try - still saving nested JSON as one


      filter {
         mutate {
            rename => { "_id" => "mongo_id" }
         }
         grok {
            match => { "log_entry" => "%{WORD}\\\"\:\s\\\"%{WORD}\,\s\\\"%{WORD}\\\"\:\s\{\\\"%{WORD}\\\"\:\s%{WORD:field}" }
         }
      }
      output { elasticsearch {
            action => "index"
            index => "mongo_log_data"
            hosts => ["https://<host>:9200"]
            ssl => false
            ssl_certificate_verification => false
            user => "elastic"
            password => "some_password"
         }
      }

I would like it to be saved in ES as:

text_one: 2
text_data_position: 1

etc

The problem of the using Grok is that i dont know how many nested fields are stored in a document, so I dont know how to build correctly the grok regex. I mean, what I currently build is just catching one field, i of course can add more regex patterns but I would like to be the grok pattern as more dynamic as possible.

Can you please help me to build a correct working grok pattern to achieve what I need?

Thanks in advance.


UPD:

I'be found some good regex that do what I need:

enter image description here

gives me exactly what I need, but when Im trying it in Grok it captures only first field:

enter image description here

0

There are 0 answers