I am trying to search Apache log files for specific entries related to specific vulnerability scans. I need to match strings from a separate file against the URI content in the weblogs. Some of the strings I am trying to find contain repeating special characters like '?'.
For example, I need to be able to match an attack that contains just the string '????????' but I don't want to be alerted on the string '??????????????????' because each attack is tied to a specific attack ID number. Therefore, using:
if attack_string in log_file_line:
alert_me()
...will not work. Because of this, I decided to put the string into a regex:
if re.findall(r'\%s' % re.escape(attack_string),log_file_line):
alert_me()
...which did not work either because any log file line containing the string '????????' is matched even if there are more than 8 '?' in the log file line.
I then tried adding boundaries to the regex:
if re.findall(r'\\B\%s\\B' % re.escape(attack_string),log_file_line):
alert_me()
...which stopped matching in both cases. I need to be able to dynamically assign the string I am looking for but I don't want to match on just any line that contains the string. How can I accomplish this?
r'\?\?\?\?\?\?\?\?(?!\?)'Not quite sure what you're asking.