17

I am trying to find a way in Fluent-bit config to tell/enforce ES to store plain json formatted logs (the log bit below that comes from docker stdout/stderror) in structured way - please see image at the bottom for better explanation. For example, apart from (or along with) storing the log as a plain json entry under log field, I would like to store each property individually as shown in red.

The documentation for Filters and Parsers are really poor and not clear. On top of that the forward input doesn't have a "parser" option. I tried json/docker/regex parsers but no luck. My regex is here if I have to use regex. Currently using ES (7.1), Fluent-bit (1.1.3) and Kibana (7.1) - not Kubernetes.

If anyone can direct me to an example or give one I would be much appreciated.

Thanks

{
  "_index": "hello",
  "_type": "logs",
  "_id": "T631e2sBChSKEuJw-HO4",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-06-21T21:34:02.000Z",
    "tag": "php",
    "container_id": "53154cf4d4e8d7ecf31bdb6bc4a25fdf2f37156edc6b859ba0ddfa9c0ab1715b",
    "container_name": "/hello_php_1",
    "source": "stderr",
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
  },
  "fields": {
    "@timestamp": [
      "2019-06-21T21:34:02.000Z"
    ]
  },
  "sort": [
    1561152842000
  ]
}

Thanks

conf

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name   forward
    Listen 0.0.0.0
    Port   24224

[OUTPUT]
    Name  es
    Match hello_*
    Host  elasticsearch
    Port  9200
    Index hello
    Type  logs
    Include_Tag_Key On
    Tag_Key tag

ssss

5 Answers 5

21

Solution is as follows.

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers.conf

[INPUT]
    Name         forward
    storage.type filesystem
    Listen       my_fluent_bit_service
    Port         24224

[FILTER]
    Name         parser
    Parser       docker
    Match        hello_*
    Key_Name     log
    Reserve_Data On
    Preserve_Key On

[OUTPUT]
    Name            es
    Host            my_elasticsearch_service
    Port            9200
    Match           hello_*
    Index           hello
    Type            logs
    Include_Tag_Key On
    Tag_Key         tag
[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped_utf8    log    do_next
    Decode_Field_As   json       log
Sign up to request clarification or add additional context in comments.

10 Comments

Thank you very much for this answer. The documentation is simply horrendous. One question: how is your log entry ultimately decoded? I get a line of key=value (such as name=john age=27 city=paris) and not a decoded structure (it is not a JSON string aymore, but not a structure visible by Kibana either)
Not sure if I understand what exactly you mean but my application logs are in JSON format by default. So your example would be {"name":"john","age":"27","city":"paris"} if it was my application. Afterwards this whole string would also look same in Kibana under log key as shown above in the image. I hope it helps. Also have a look at this for much detailed example.
Sorry for not having been clear. I used to have {"name":"john","age":"27","city":"paris"} as the message entry in my log, displayed as such by Kibana. I was hoping that this entry can be decoded by Fluent Bit so that it goes to Elasticsearch as a true JSON entry, and so that I have the keys name, ageand city as fields (at the same level as your entre tagor source.
(cont'd) What I have is still a message entry which is now name=john age=27 city=paris (instead of the JSON string representation before). I was wondering if this is the expected behaviour (which makes the decoder useless because I cannot search on key city for instance)
In other words, the entry under message has been rewritten from the string {"name":"john","age":"27","city":"paris"} into the string name=john age=27 city=paris, which is not the parsing I expected (→ to "explode" the JSON string into actual fields for Kibana)
|
1

Based on a log file with JSON objects separated by newlines, I was able to get it working with this config. No filters/decoders necessary.

The key point was to create a JSON parser, and set the parser name in the INPUT section.

Note that the tail docs say you should set up the DB option. This is just a minimal config to get it working.

fluent-bit.conf:

[SERVICE]
  Parsers_File parsers.conf

[INPUT]
  Name tail
  Parser myparser
  Path /json_objs_separated_by_newlines.log

[OUTPUT]
  Name es
  Host elasticsearch
  # Required for Elasticsearch 8+
  Suppress_Type_Name On

parsers.conf:

[PARSER]
  Name myparser
  Format json

Comments

0

Answering for a more general use case when using Firelens with the aws-for-fluent-bit image, where the message end up as a top-level key log like here:

{
    "log": "{\"time_local\":\"2019-06-21T21:34:02+0000\",\"client_ip\":\"-\",\"remote_addr\":\"192.168.192.3\",\"remote_user\":\"\",\"request\":\"GET / HTTP/1.1\",\"status\":\"200\",\"body_bytes_sent\":\"0\",\"request_time\":\"0.001\",\"http_referrer\":\"-\",\"http_user_agent\":\"curl/7.38.0\",\"request_id\":\"91835d61520d289952b7e9b8f658e64f\"}"
}

Following this official AWS example, notice that a JSON parser already exists in the image, which can be used like so:

"firelensConfiguration": {
    "type": "fluentbit",
    "options": {
        "config-file-type": "file",
        "config-file-value": "/fluent-bit/configs/parse-json.conf"
    }
}

OR it can be called with an environment variable:

"environment": [
    {
        "name": "aws_fluent_bit_init_file_1",
        "value": "/fluent-bit/configs/parse-json.conf"
    }
]

Result:

{
    "time_local": "2019-06-21T21:34:02+0000",
    "client_ip": "-",
    "remote_addr": "192.168.192.3",
    "remote_user": "",
    "request": "GET / HTTP/1.1",
    "status": "200",
    "body_bytes_sent": "0",
    "request_time": "0.001",
    "http_referrer": "-",
    "http_user_agent": "curl/7.38.0",
    "request_id": "91835d61520d289952b7e9b8f658e64f"
}

Comments

-1

Result:

{
    "time_local": "2019-06-21T21:34:02+0000",
    "client_ip": "-",
    "remote_addr": "192.168.192.3",
    ...
    "request_id": "91835d61520d289952b7e9b8f658e64f"
}

How fields from the above result can be set up as regular "Available Fields" in OpenSearch dashboard? (In my case different inputs in one index group have similar fields and some different fields)

UPD: I set up fluent-bit parser using "config-file-value": "/fluent-bit/configs/parse-json.conf" from above comments.

Comments

-4

You can use the Fluent Bit Nest filter for that purpose, please refer to the following documentation:

https://docs.fluentbit.io/manual/filter/nest

3 Comments

OP - "The documentation for Filters and Parsers are really poor and not clear.". I've spent good enough time with the doc hence reason ended up with this question.
The documentation is EXTREMELY lacking
Nest is actually the opposite of what's required here, as we want to un-nest

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.