send consecutive invalid json lines between valid json lines in a single filebeat message

751 views Asked by At

I have a file that contains line separated json objects as well as non json data (stderr stacktraces).

{"timestamp": "20170104T17:10:39", "retry": 0, "level": "info", "event": "failed to download"}
{"timestamp": "20170104T17:10:40", "retry": 1, "level": "info", "event": "failed to download"}
{"timestamp": "20170104T17:10:41", "retry": 2, "level": "info", "event": "failed to download"}
Traceback (most recent call last):
  File "a.py", line 12, in <module>
    foo()
  File "a.py", line 10, in foo
    bar()
  File "a.py", line 4, in bar
    raise Exception("This was unexpected")
Exception: This was unexpected
{"timestamp": "20170104T17:10:42", "retry": 3, "level": "info", "event": "failed to download"}
{"timestamp": "20170104T17:10:43", "retry": 4, "level": "info", "event": "failed to download"}

Using the following config, I'm able to get the valid json lines properly but the invalid json is being sent individualy (line by line).

filebeat.yml

filebeat.prospectors:
  - input_type: log
    document_type: mytype
    json:
      message_key: event
      add_error_key: true
    paths:
        - /tmp/*.log

output:
  console:
    pretty: true

  file:
    path: "/tmp/filebeat"
    filename: filebeat

output:

{
  "@timestamp": "2017-01-04T12:03:36.659Z",
  "beat": {
    "hostname": "...", "name": "...", "version": "5.1.1"
  },
  "input_type": "log",
  "json": {
    "event": "failed to download",
    "level": "info",
    "retry": 2,
    "timestamp": "20170104T17:10:41"
  },
  "offset": 285,
  "source": "/tmp/test.log",
  "type": "mytype"
}
{
  "@timestamp": "2017-01-04T12:03:36.659Z",
  "beat": {
    "hostname": "...", "name": "...", "version": "5.1.1"
  },
  "input_type": "log",
  "json": {
    "event": "Traceback (most recent call last):",
    "json_error": "Error decoding JSON: invalid character 'T' looking for beginning of value"
  },
  "offset": 320,
  "source": "/tmp/test.log",
  "type": "mytype"
}

I want to club all the non json lines until a new json line into one message.

Using multiline, I tried the following

filebeat.prospectors:
  - input_type: log
    document_type: mytype
    json:
      message_key: event
      add_error_key: true
    paths:
        - /tmp/*.log
    multiline:
      pattern: '^{'
      negate: true
      match: after

output:
  console:
    pretty: true

  file:
    path: "/tmp/filebeat"
    filename: filebeat

But it doesn't seem to be working. Its performing the multiline rules on the values of event key, which was specified in json.message_key.

From the docs here I understand why that is happening json.message_key -

JSON key on which to apply the line filtering and multiline settings. This key must be top level and its value must be string, otherwise it is ignored. If no text key is defined, the line filtering and multiline features cannot be used.

Is there any other way to club consecutive non json lines into a single message ?

I'd like the entire stack trace to be captured before it sends it to logstash.

1

There are 1 answers

0
A J On BEST ANSWER

Filebeat applies the multiline grouping after the JSON parsing so the multiline pattern cannot be based on the characters that make up the JSON object (e.g. {).

In Filebeat there is another way to do JSON parsing such that the JSON parsing occurs after the multiline grouping so your pattern can include the JSON object characters. You need Filebeat 5.2 (soon to be released) because the target field was added to the decode_json_fields processor so that you can specify where the decoded json fields will be added to the event.

filebeat.prospectors:
- paths: [input.txt]
  multiline:
    pattern: '^({|Traceback)'
    negate:  true
    match:   after

processors:
- decode_json_fields:
    when.regexp:
      message: '^{'
    fields: message
    target:
- drop_fields:
    when.regexp:
      message: '^{'
    fields: message

I tested the multiline pattern here using the Golang playground.

Filebeat produces the following output (using the log lines you gave above as the input). (I used a build from the master branch.)

{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"event":"failed to download","input_type":"log","level":"info","offset":95,"retry":0,"source":"input.txt","timestamp":"20170104T17:10:39","type":"log"}
{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"event":"failed to download","input_type":"log","level":"info","offset":190,"retry":1,"source":"input.txt","timestamp":"20170104T17:10:40","type":"log"}
{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"event":"failed to download","input_type":"log","level":"info","offset":285,"retry":2,"source":"input.txt","timestamp":"20170104T17:10:41","type":"log"}
{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"input_type":"log","message":"Traceback (most recent call last):\n  File \"a.py\", line 12, in \u003cmodule\u003e\n    foo()\n  File \"a.py\", line 10, in foo\n    bar()\n  File \"a.py\", line 4, in bar\n    raise Exception(\"This was unexpected\")\nException: This was unexpected","offset":511,"source":"input.txt","type":"log"}
{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"event":"failed to download","input_type":"log","level":"info","offset":606,"retry":3,"source":"input.txt","timestamp":"20170104T17:10:42","type":"log"}
{"@timestamp":"2017-01-05T20:34:18.862Z","beat":{"hostname":"host.example.com","name":"host.example.com","version":"5.2.0-SNAPSHOT"},"event":"failed to download","input_type":"log","level":"info","offset":702,"retry":4,"source":"input.txt","timestamp":"20170104T17:10:43","type":"log"}