0

I want to mask all types of sensitive data (usernames, passwords, api keys, DB connection strings, endpoints, secrets, and even any custom variables containing secrets) present in a flat log file.

Following is the script that I'm using currently:

import re

def mask_secrets(log_file):
    # Read the log file
    with open(log_file, 'r') as file:
        log_data = file.read()

    # Define the pattern to search for sensitive data
    pattern = r'\b(\w+)\b:\s*(\w+)'

    # Mask the sensitive data in the log data
    log_data = re.sub(pattern, r'\1: ********', log_data)

    # Write the masked log data back to the file
    with open(log_file, 'w') as file:
        file.write(log_data)

# Usage example
log_file = 'path/to/your/log/file.txt'
mask_secrets(log_file)

But it is masking the time field of timestamp and is not masking some secrets like DB Connection string and DB password, and custom variables containing secrets:

2022-01-01 12: ********:00 - User login successful - username: ********.doe, password: ********
2022-01-02 09: ********:15 - API request made - endpoint: /api/data, api_key: ********
2022-01-03 14: ********:22 - User login failed - username: ********.smith, password: ********
2022-01-04 18: ********:10 - API request made - endpoint: /api/data, api_key: ********
2022-01-06 17: ********:22 - DB Connection failed - DB String=guad8b237d7$vu87s, DB password=isbdihkaw978vw8a783wgfb
2022-01-07 19: ********:10 - API request made - endpoint: /api/data, api_key= xyz789s87dv7ghs
2022-01-07 19: ********:10 - User login failed - foo=uyai6d3ibdqi%*^^@%, bar=862479dhb7656%^&^%%^))_=

The regex used in this script needs to be modified accordingly. Ideally, I would like to mask any value that's present on the right side of =. Is it possible to do so?

4
  • Do you understand the code you are currently using?   If so, what do you not understand?   If not, try harder to understand it, and then ask a more focussed question. Commented Apr 9, 2024 at 10:42
  • 3
    We cannot help you parse data you don't show us. We have no idea what the input looks like and no idea what the output would be. You can edit and add specific examples, but I can tell you now what you want is basically impossible since you will need the script to magically know what is and what is not "secret". You are just masking all sets of 1 or more letters or underscores followed by : and then another set of letters or underscores. Why would passwords only contain letters? As for masking after =, if you have written this script, how can that be an issue? Commented Apr 9, 2024 at 11:03
  • "Ideally, I would like to mask any value that's present on the right side of =. Is it possible to do so?", yes, of course. Would something like sed 's/=.*/<omitted>/' log.file be feasible? Commented Apr 9, 2024 at 13:47
  • @terdon Apologies for omitting the input data in my question, here it is: toptal.com/developers/paste-gd/7PpwzLRu. I think you did not understand my requirement and the issue I'm facing with my script. I resolved the masking of the minute field of timestamp hence I got the desired output for the log file but not the groovy code as mentioned in the above link. There is DB Name and Psswd in it. Now individually making scripts for both type of files (log file and groovy file) work fine but I want to make a single script that works for every type of file that can be opened as text. Commented Apr 11, 2024 at 17:23

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.