Fluent-bit - Splitting json log into structured fields in Elasticsearch

Question

I am trying to find a way in Fluent-bit config to tell/enforce ES to store plain json formatted logs (the log bit below that comes from docker stdout/stderror) in structured way - please see image at the bottom for better explanation. For example, apart from (or along with) storing the log as a plain json entry under log field, I would like to store each property individually as shown in red.

The documentation for Filters and Parsers are really poor and not clear. On top of that the forward input doesn't have a "parser" option. I tried json/docker/regex parsers but no luck. My regex is here if I have to use regex. Currently using ES (7.1), Fluent-bit (1.1.3) and Kibana (7.1) - not Kubernetes.

If anyone can direct me to an example or give one I would be much appreciated.

Thanks

{
  "_index": "hello",
  "_type": "logs",
  "_id": "T631e2sBChSKEuJw-HO4",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-06-21T21:34:02.000Z",
    "tag": "php",
    "container_id": "53154cf4d4e8d7ecf31bdb6bc4a25fdf2f37156edc6b859ba0ddfa9c0ab1715b",
    "container_name": "/hello_php_1",
    "source": "stderr",
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
  },
  "fields": {
    "@timestamp": [
      "2019-06-21T21:34:02.000Z"
    ]
  },
  "sort": [
    1561152842000
  ]
}

Thanks

conf

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name   forward
    Listen 0.0.0.0
    Port   24224

[OUTPUT]
    Name  es
    Match hello_*
    Host  elasticsearch
    Port  9200
    Index hello
    Type  logs
    Include_Tag_Key On
    Tag_Key tag

ssss

BentCoder · Accepted Answer · 2020-01-22 14:50:08Z

21

Solution is as follows.

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name         forward
    storage.type filesystem
    Listen       my_fluent_bit_service
    Port         24224

[FILTER]
    Name         parser
    Parser       docker
    Match        hello_*
    Key_Name     log
    Reserve_Data On
    Preserve_Key On

[OUTPUT]
    Name            es
    Host            my_elasticsearch_service
    Port            9200
    Match           hello_*
    Index           hello
    Type            logs
    Include_Tag_Key On
    Tag_Key         tag

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped_utf8    log    do_next
    Decode_Field_As   json       log

answered Jan 22, 2020 at 14:50

BentCoder

12.8k23 gold badges100 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

WoJ Over a year ago

Thank you very much for this answer. The documentation is simply horrendous. One question: how is your log entry ultimately decoded? I get a line of key=value (such as name=john age=27 city=paris) and not a decoded structure (it is not a JSON string aymore, but not a structure visible by Kibana either)

BentCoder Over a year ago

Not sure if I understand what exactly you mean but my application logs are in JSON format by default. So your example would be {"name":"john","age":"27","city":"paris"} if it was my application. Afterwards this whole string would also look same in Kibana under log key as shown above in the image. I hope it helps. Also have a look at this for much detailed example.

WoJ Over a year ago

Sorry for not having been clear. I used to have {"name":"john","age":"27","city":"paris"} as the message entry in my log, displayed as such by Kibana. I was hoping that this entry can be decoded by Fluent Bit so that it goes to Elasticsearch as a true JSON entry, and so that I have the keys name, ageand city as fields (at the same level as your entre tagor source.

WoJ Over a year ago

(cont'd) What I have is still a message entry which is now name=john age=27 city=paris (instead of the JSON string representation before). I was wondering if this is the expected behaviour (which makes the decoder useless because I cannot search on key city for instance)

WoJ Over a year ago

In other words, the entry under message has been rewritten from the string {"name":"john","age":"27","city":"paris"} into the string name=john age=27 city=paris, which is not the parsing I expected (→ to "explode" the JSON string into actual fields for Kibana)

|

Adam Millerchip · Accepted Answer · 2023-12-20 09:41:17Z

1

Based on a log file with JSON objects separated by newlines, I was able to get it working with this config. No filters/decoders necessary.

The key point was to create a JSON parser, and set the parser name in the INPUT section.

Note that the tail docs say you should set up the DB option. This is just a minimal config to get it working.

fluent-bit.conf:

[SERVICE]
  Parsers_File parsers.conf

[INPUT]
  Name tail
  Parser myparser
  Path /json_objs_separated_by_newlines.log

[OUTPUT]
  Name es
  Host elasticsearch
  # Required for Elasticsearch 8+
  Suppress_Type_Name On

parsers.conf:

[PARSER]
  Name myparser
  Format json

answered Dec 20, 2023 at 9:41

Adam Millerchip

23.3k6 gold badges62 silver badges86 bronze badges

Comments

Stéphane Bruckert · Accepted Answer · 2023-11-01 17:10:31Z

Answering for a more general use case when using Firelens with the aws-for-fluent-bit image, where the message end up as a top-level key log like here:

{
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
}

Following this official AWS example, notice that a JSON parser already exists in the image, which can be used like so:

"firelensConfiguration": {
    "type": "fluentbit",
    "options": {
        "config-file-type": "file",
        "config-file-value": "/fluent-bit/configs/parse-json.conf"
    }
}

OR it can be called with an environment variable:

"environment": [
    {
        "name": "aws_fluent_bit_init_file_1",
        "value": "/fluent-bit/configs/parse-json.conf"
    }
]

Result:

{
    "time_local": "2019-06-21T21:34:02+0000",
    "client_ip": "-",
    "remote_addr": "192.168.192.3",
    "remote_user": "",
    "request": "GET / HTTP/1.1",
    "status": "200",
    "body_bytes_sent": "0",
    "request_time": "0.001",
    "http_referrer": "-",
    "http_user_agent": "curl/7.38.0",
    "request_id": "91835d61520d289952b7e9b8f658e64f"
}

Nox · Accepted Answer · 2024-09-11 11:23:38Z

-1

Result:

{
    "time_local": "2019-06-21T21:34:02+0000",
    "client_ip": "-",
    "remote_addr": "192.168.192.3",
    ...
    "request_id": "91835d61520d289952b7e9b8f658e64f"
}

How fields from the above result can be set up as regular "Available Fields" in OpenSearch dashboard? (In my case different inputs in one index group have similar fields and some different fields)

UPD: I set up fluent-bit parser using "config-file-value": "/fluent-bit/configs/parse-json.conf" from above comments.

edited Sep 11, 2024 at 11:23

answered Sep 11, 2024 at 11:20

Nox

12 bronze badges

Comments

edsiper · Accepted Answer · 2019-07-26 05:09:22Z

-4

You can use the Fluent Bit Nest filter for that purpose, please refer to the following documentation:

https://docs.fluentbit.io/manual/filter/nest

answered Jul 26, 2019 at 5:09

edsiper

4162 silver badges4 bronze badges

3 Comments

BentCoder Over a year ago

OP - "The documentation for Filters and Parsers are really poor and not clear.". I've spent good enough time with the doc hence reason ended up with this question.

Shōgun8 Over a year ago

The documentation is EXTREMELY lacking

Stéphane Bruckert Over a year ago

Nest is actually the opposite of what's required here, as we want to un-nest

Collectives™ on Stack Overflow

Fluent-bit - Splitting json log into structured fields in Elasticsearch

5 Answers 5

10 Comments

Comments

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

10 Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related