I want to mask all types of sensitive data (usernames, passwords, api keys, DB connection strings, endpoints, secrets, and even any custom variables containing secrets) present in a flat log file.
Following is the script that I'm using currently:
import re
def mask_secrets(log_file):
# Read the log file
with open(log_file, 'r') as file:
log_data = file.read()
# Define the pattern to search for sensitive data
pattern = r'\b(\w+)\b:\s*(\w+)'
# Mask the sensitive data in the log data
log_data = re.sub(pattern, r'\1: ********', log_data)
# Write the masked log data back to the file
with open(log_file, 'w') as file:
file.write(log_data)
# Usage example
log_file = 'path/to/your/log/file.txt'
mask_secrets(log_file)
But it is masking the time field of timestamp and is not masking some secrets like DB Connection string and DB password, and custom variables containing secrets:
2022-01-01 12: ********:00 - User login successful - username: ********.doe, password: ********
2022-01-02 09: ********:15 - API request made - endpoint: /api/data, api_key: ********
2022-01-03 14: ********:22 - User login failed - username: ********.smith, password: ********
2022-01-04 18: ********:10 - API request made - endpoint: /api/data, api_key: ********
2022-01-06 17: ********:22 - DB Connection failed - DB String=guad8b237d7$vu87s, DB password=isbdihkaw978vw8a783wgfb
2022-01-07 19: ********:10 - API request made - endpoint: /api/data, api_key= xyz789s87dv7ghs
2022-01-07 19: ********:10 - User login failed - foo=uyai6d3ibdqi%*^^@%, bar=862479dhb7656%^&^%%^))_=
The regex used in this script needs to be modified accordingly. Ideally, I would like to mask any value that's present on the right side of =. Is it possible to do so?
:and then another set of letters or underscores. Why would passwords only contain letters? As for masking after=, if you have written this script, how can that be an issue?sed 's/=.*/<omitted>/' log.filebe feasible?