bash tail on a live log file, counting uniq lines with same date/time

18.1k views Asked by At

I'm looking for a good way to tail on a live log file, and display number of lines with the same date/time.

Currently this is working:

 tail -F /var/logs/request.log | [cut the date-time] | uniq -c

BUT the performance is not good enough. There is a delay of more than one minute, and it output in bulks of few lines each time.

Any idea?

3

There are 3 answers

0
Floris On BEST ANSWER

Your problem is most likely related to buffering in your system, not anything intrinsically wrong with your line of code. I was able to create a test scenario where I could reproduce it - then make it go away. I hope it will work for you too.

Here is my test scenario. First I write a short script that writes the time to a file every 100 ms (approx) - this is my "log file" that generates enough data that uniq -c should give me an interesting output every second:

#!/bin/ksh
while :
do
  echo The time is `date` >> a.txt
  sleep 0.1
done

(Note - I had to use ksh which has the ability to do a sub-second sleep)

In another window, I type

tail -f a.txt | uniq -c

Sure enough, you get the following output appearing every second:

   9 The time is Thu Dec 12 21:01:05 EST 2013
  10 The time is Thu Dec 12 21:01:06 EST 2013
  10 The time is Thu Dec 12 21:01:07 EST 2013
   9 The time is Thu Dec 12 21:01:08 EST 2013
  10 The time is Thu Dec 12 21:01:09 EST 2013
   9 The time is Thu Dec 12 21:01:10 EST 2013
  10 The time is Thu Dec 12 21:01:11 EST 2013
  10 The time is Thu Dec 12 21:01:12 EST 2013

etc. No delays. Important to note - I did not attempt to cut out the time. Next, I did

tail -f a.txt | cut -f7 -d' ' | uniq -c

And your problem reproduced - it would "hang" for quite a while (until there was 4k of characters in the buffer, and then it would vomit it all out at once).

A bit of searching online ( https://stackoverflow.com/a/16823549/1967396 ) told me of a utility called stdbuf . At that reference, it specifically mentions almost exactly your scenario, and they provide the following workaround (paraphrasing to match my scenario above):

tail -f a.txt | stdbuf -oL cut -f7 -d' ' | uniq -c

And that would be great… except that this utility doesn't exist on my machine (Mac OS) - it is specific to GNU coreutils. This left me unable to test - although it may be a good solution for you.

Never fear - I found the following workaround, based on the socat command (which I honestly barely understand, but I adapted from the answer given at https://unix.stackexchange.com/a/25377 ).

Make a small file called tailcut.sh (this is the "long_running_command" from the link above):

#!/bin/ksh
tail -f a.txt | cut -f7 -d' '

Give it execute permissions with chmod 755 tailcut.sh . Then issue the following command:

socat EXEC:./tailcut.sh,pty,ctty STDIO | uniq -c

And hey presto - your lumpy output is lumpy no more. The socat sends the output from the script straight to the next pipe, and uniq can do its thing.

1
Jan Matějka On

Consider how uniq -c is working.

In order to print the count, it needs to read all the unique lines and only once a line that is different from the previous one, it can print the line and number of occurences.

That's just how the algorithm fundamentally works and there is no way around it.

You can test this by running

touch a
tail -F a | uniq -c

And then one after another

echo 1 >> a
echo 1 >> a
echo 1 >> a

nothing happens. Only after you run

echo 2 >> a

uniq can print there were 3 "1\n" occurences.

4
Julien Palard On

You may try logtop, (apt-get install logtop):

Usage:

tail -F /var/logs/request.log | [cut the date-time] | logtop

Example:

$ tail -f /var/log/varnish/varnishncsa.log  | awk '{print $4}' | logtop
5585 elements in 10 seconds (558.50 elements/s)
   1  690 69.00/s [28/Mar/2015:23:13:48
   2  676 67.60/s [28/Mar/2015:23:13:47
   3  620 62.00/s [28/Mar/2015:23:13:49
   4  576 57.60/s [28/Mar/2015:23:13:53
   5  541 54.10/s [28/Mar/2015:23:13:54
   6  540 54.00/s [28/Mar/2015:23:13:55
   7  511 51.10/s [28/Mar/2015:23:13:51
   8  484 48.40/s [28/Mar/2015:23:13:52
   9  468 46.80/s [28/Mar/2015:23:13:50

Columns are, from left to right:

  • Just row number
  • qte seen
  • hits per second
  • the actual line