I made a simple algorithm that makes my logs more suitable to be plotted. A logfile has the following structure:
timestamp bytes
and looks like this:
1485762953167032 517
1485762953167657 517
1485762953188416 517
1485762953188504 517
1485762953195641 151
1485762953196256 151
1485762953198736 216
1485762953200099 216
1485762953201115 1261
1485762953201658 151
1485762953201840 151
1485762953202040 1261
1485762953203387 216
1485762953204183 216
1485762953206935 549
1485762953207548 546
1485762953259335 306
1485762953260025 1448
1485762953260576 1448
1485762953261087 1448
1485762953261790 1500
1485762953263878 1500
1485762953264273 1448
1485762953264914 1500
I get my timestamp with gettimeofday(&t, NULL) and achieve a precision long long timestamp = (t.tv_sec * 1000000) + t.tv_usec.
When I said before "makes my logs more suitable" I meant "timestamp should be set to seconds precision, not microseconds", so both columns will be modified.
My naive algorithm is something like this (it is in Bash and since maybe not everybody likes it, you got its pseudocode): ready every line of log file, transform timestamp from microseconds to seconds dividing one million, if it is the same transformed timestamp of the one in the previous line, update the count of bytes and packets (one packet per row).
for every file.log {
TIMESTAMP=0
BYTES=0
PACKETS=0
echo "Creating file $LOGFILE.plt..."
create file $LOGFILE.plt
echo "Preparing $LOGFILE for plotting..."
while read file.log LINE {
# LINE is a array: LINE[0] contains timestamp, LINE[1] contains bytes
if PACKETS == 0 {
# first row
TIMESTAMP = LINE[0] / 1000000
BYTES = BYTES + LINE[1]
PACKETS += 1
echo TIMESTAMP " " BYTES " " PACKETS
}
TEMP_TIMESTAMP = LINE[0] / 1000000
if TIMESTAMP == TEMP_TIMESTAMP {
BYTES = BYTES + LINE[1]
PACKETS += 1
}
else {
if PACKETS != 0 {
log in file "TIMESTAMP BYTES PACKETS" >> $LOGFILE.plt
}
TIMESTAMP = TEMP_TIMESTAMP
BYTES=0
PACKETS=1
}
}
}
Since it reads every line of logfile it is at least O(n) and for long logfile it may takes a while: is there a way I can shorten this time?
I can't make this optimisation in my code: it is a program built on realtime performance (i.e. streaming and other responsive services) so it has to make the least effort possible; it already logs, it can't calculate optimisation stuff on logs too.