I looked all through the related questions and could not find a solution. I'm pretty new with Python. Here's what I've got.
-I set up a honeypot on an Ubuntu VM that watches for access attempts to my server, blocks the access, then outputs details of the attempted access in a text formatted file. The format of each looks like this :
INTRUSION ATTEMPT DETECTED! from 10.0.0.1:80 (2022-06-06 13:17:24)
--------------------------
GET / HTTP/1.1
HOST: 10.0.0.1
X-FORWARDED-SCHEME http
X-FORWARDED-PROTO: http
x-FORWARDED-For: 139.162.191.89
X-Real-IP: 139.162.191.89
Connection: close
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X)
Accept: */*
Accept-Encoding: gzip
The text file just grows and grows with access attempts however it's not in a format such as CSV that I can use for other programs. What I'd like to do is take this file, read it, parse the information and have it written in CSV format in a separate file, then delete the contents of the original file to stop duplicates.
I'm thinking removing the contents after each read may not be needed and could be handled in the CSV file by looking for duplicates and omitting them. However, I'm noticing multiple attempts and logs containing the same IP address meaning one host is attempting access multiple times so maybe deleting the original each time may be best.