I have a .txt file where by a number of Snort alerts are generated. I would like to search through this file and delete duplicate alerts and keep only one of each. I am using the following code so far:
with open('SnortReportFinal', 'r') as f:
file_lines = f.readlines()
cont_lines = []
for line in range(len(file_lines)):
if re.search('\d:\d+:\d+', file_lines[line]):
cont_lines.append(line)
for idx in cont_lines[1:]: # skip one instance of the string
file_lines[idx] = "" # replace all others
with open('SnortReportFinal', 'w') as f:
f.writelines(file_lines)
The regular expression matches the string I am searching for i.e. 1:234:5, should it find multiple instances of the same string I would like it to delete them and keep only one. This does not work as all other strings are being deleted and it is keeping only one string that the expression matches.
File Contains text like this:
[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]
[1:368:6] ICMP PING BSDtype [**]
Where the part [1:368:6] could be a variation of numbers, i.e. [1:5476:5].
I would like my expected output to be only:
[1:368:6] ICMP PING BSDtype [**]
[1:563:2] ICMP PING BSDtype [**]
The rest of the strings being deleted, by rest I mean the difference in numbers is fine, but not duplicate numbers.