0

I am extracting from the log file and print using the below code

for line in data:
    g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
    print (g)

[('1.1.1.1', 'PUT')]
[('2.2.2.2', 'GET')]
[('1.1.1.1', 'PUT')]
[('2.2.2.2', 'POST')]

How to add to the output

output

1.1.1.1: PUT = 2
2.2.2.2: GET = 1,POST=1
2
  • Not clear what you mean by How to add to the output? Commented Jul 23, 2019 at 7:36
  • Are you trying to count for each address the number of occurrence of each requests? Commented Jul 23, 2019 at 7:40

5 Answers 5

1

You could use a dictionary to count:

# initialize the count dict
count_dict= dict()
for line in data:
    g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
    for tup in g:
        # get the counts for tuple tup if we don't have it yet
        # use 0 (second argument to .get)
        num= count_dict.get(tup, 0)
        # increase the count and write it back
        count_dict[tup]= num+1
# now iterate over the key (tuple) - value (counts)-pairs
# and print the result
for tup, count in count_dict.items():
    print(tup, count)

Ok, I have to admit this doesn't give the exact output, you want, but from this you can do in a similar manner:

out_dict= dict()
for (comma_string, request_type), count in count_dict.items():
    out_str= out_dict.get(comma_string, '')
    sep='' if out_str == '' else ', '
    out_str= f'{out_str}{sep}{request_type} = {count}'
    out_dict[comma_string]= out_str

for tup, out_str in out_dict.items():
    print(tup, out_str)

From your data that outputs:

1.1.1.1 PUT = 2
2.2.2.2 GET = 1, POST = 1
Sign up to request clarification or add additional context in comments.

Comments

1

I would look towards Counter.

from collections import Counter
results = []
for line in data:
    g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
    results.append(g[0])
ip_list = set(result[0] for result in results)
for ip in ip_list:
    print(ip, Counter(result[1] for result in results if result[0] == ip ))

3 Comments

That's neat. I like it. Just changed the print-line (you were just missing the if..., so your solution printed the same (total) counts for each line, now it works. It should read print(ip, Counter(result[1] for result in results if result[0] == ip ))
Oh, just checked twice, you could even avoid the ip_list which would make it even shorter. You could just do for ip, request_types in results: and in the loop print(ip, Counter(request_types).
Good catch with the if clause. I think Counter(request_types) would blow it up, as Counter consumes iterables and mappings -> Counter would output letter counts of request_types.
0

You can use collection.defaultdict

Ex:

from collections import defaultdict

result = defaultdict(list)
for line in data:
    for ip, method in re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line):
        result[ip].append(method)

for k, v in result.items():
    temp = ""
    for i in set(v):
        temp += " {} = {}".format(i, v.count(i))
    print("{}{}".format(k, temp))

Comments

0
from collections import Counter  
x = [[('1.1.1.1', 'PUT')],[('2.2.2.2', 'GET')],[('1.1.1.1', 'PUT')],[('2.2.2.2', 'POST')]]
# step 1: convert x into a dict.
m = {}
for i in x:
    a, b = i[0]
    if a not in m.keys():
        m[a] = [b] 
    else: 
        x = m[a] 
        x.append(b)
        m[a] = x    
print('new dict is {}'.format(m))

# step 2 count frequency
m_values = list(m.values())
yy = []
for i in m_values:
    x = []
    k = list(Counter(i).keys())
    v = list(Counter(i).values())
    for i in range(len(k)):       
        x.append(k[i] + '=' + str(v[i]))
    yy.append(x)

# step 3, update the value of the dict
m_keys =  list(m.keys())
n = len(m_keys)
for i in range(n):
    m[m_keys[i]] = yy[i]

print("final dict is{}".format(m))

Output is

new dict is {'1.1.1.1': ['PUT', 'PUT'], '2.2.2.2': ['GET', 'POST']}
final dict is{'1.1.1.1': ['PUT=2'], '2.2.2.2': ['GET=1', 'POST=1']}

Comments

0

Without dependencies and using a dict for counting, in a very basic way. Given the data_set:

data_set = [[('1.1.1.1', 'PUT')],
            [('2.2.2.2', 'GET')],
            [('2.2.2.2', 'POST')],
            [('1.1.1.1', 'PUT')]]

Initialize the variables (manually, just few verbs) then iterate over the data:

counter = {'PUT': 0, 'GET': 0, 'POST': 0, 'DELETE': 0}
res = {}

for data in data_set:
  ip, verb = data[0]
  if not ip in res:
    res[ip] = counter
  else:
    res[ip][verb] += 1

print(res)
#=> {'1.1.1.1': {'PUT': 1, 'GET': 0, 'POST': 1, 'DELETE': 0}, '2.2.2.2': {'PUT': 1, 'GET': 0, 'POST': 1, 'DELETE': 0}}

It's required to format the output to better fits your needs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.