Suppressing summary information in `wc -l` output

6.4k views Asked by At

I use the command wc -l count number of lines in my text files (also i want to sort everything through a pipe), like this:

wc -l $directory-path/*.txt | sort -rn

The output includes "total" line, which is the sum of lines of all files:

10 total
5 ./directory/1.txt
3 ./directory/2.txt
2 ./directory/3.txt

Is there any way to suppress this summary line? Or even better, to change the way the summary line is worded? For example, instead of "10", the word "lines" and instead of "total" the word "file".

12

There are 12 answers

5
F. Hauri  - Give Up GitHub On BEST ANSWER

Yet a sed solution!

1. short and quick

As total are comming on last line, $d is the command for deleting last line.

wc -l $directory-path/*.txt | sed '$d'

2. with header line addition:

wc -l $directory-path/*.txt | sed '$d;1ilines total'

Unfortunely, there is no alignment.

3. With alignment: formatting left column at 11 char width.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        /^ *[0-9]\+ total$/d;
        1i\      lines filename'

Will do the job

      lines file
          5 ./directory/1.txt
          3 ./directory/2.txt
          2 ./directory/3.txt

4. But if really your wc version could put total on 1st line:

This one is for fun, because I don't belive there is a wc version that put total on 1st line, but...

This version drop total line everywhere and add header line at top of output.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        1{
            /^ *[0-9]\+ total$/ba;
            bb;
           :a;
            s/^.*$/      lines file/
        };
        bc;
       :b;
        1i\      lines file' -e '
       :c;
        /^ *[0-9]\+ total$/d
    '

This is more complicated because we won't drop 1st line, even if it's total line.

2
jojo On

Not the most optimized way since you can use combinations of cat, echo, coreutils, awk, sed, tac, etc., but this will get you want you want:

wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '$d'

wc -l ./*.txt will extract the line count. awk 'BEGIN{print "Line\tFile"}1' will add the header titles. The 1 corresponds to the first line of the stdin. sed '$d' will print all lines except the last one.

Example Result

Line    File
      6 ./test1.txt
      1 ./test2.txt
2
Mark Setchell On

You can solve it (and many other problems that appear to need a for loop) quite succinctly using GNU Parallel like this:

parallel wc -l ::: tmp/*txt

Sample Output

   3 tmp/lines.txt
   5 tmp/unfiltered.txt
  42 tmp/file.txt
   6 tmp/used.txt
3
codeforester On

The simplicity of using just grep -c

I rarely use wc -l in my scripts because of these issues. I use grep -c instead. Though it is not as efficient as wc -l, we don't need to worry about other issues like the summary line, white space, or forking extra processes.

For example:

/var/log# grep -c '^' *
alternatives.log:0
alternatives.log.1:3
apache2:0
apport.log:160
apport.log.1:196
apt:0
auth.log:8741
auth.log.1:21534
boot.log:94
btmp:0
btmp.1:0
<snip>

Very straight forward for a single file:

line_count=$(grep -c '^' my_file.txt)

Performance comparison: grep -c vs wc -l

/tmp# ls -l *txt
-rw-r--r-- 1 root root 721009809 Dec 29 22:09 x.txt
-rw-r----- 1 root root 809338646 Dec 29 22:10 xyz.txt

/tmp# time grep -c '^' *txt

x.txt:7558434
xyz.txt:8484396

real    0m12.742s
user    0m1.960s
sys 0m3.480s

/tmp/# time wc -l *txt
   7558434 x.txt
   8484396 xyz.txt
  16042830 total

real    0m9.790s
user    0m0.776s
sys 0m2.576s
0
Walter A On

Can you use another wc ?

The POSIX wc(man -s1p wc) shows
If more than one input file operand is specified, an additional line shall be written, of the same format as the other lines, except that the word total (in the POSIX locale) shall be written instead of a pathname and the total of each column shall be written as appropriate. Such an additional line, if any, is written at the end of the output.

You said the Total line was the first line, the manual states its the last and other wc's don't show it at all. Removing the first or last line is dangerous, so I would grep -v the line with the total (in the POSIX locale...), or just grep the slash that's part of all other lines:

wc -l $directory-path/*.txt | grep "/"
5
Keith Thompson On

This is actually fairly tricky.

I'm basing this on the GNU coreutils version of the wc command. Note that the total line is normally printed last, not first (see my comment on the question).

wc -l prints one line for each input file, consisting of the number of lines in the file followed by the name of the file. (The file name is omitted if there are no file name arguments; in that case it counts lines in stdin.)

If and only if there's more than one file name argument, it prints a final line containing the total number of lines and the word total. The documentation indicates no way to inhibit that summary line.

Other than the fact that it's preceded by other output, that line is indistinguishable from output for a file whose name happens to be total.

So to reliably filter out the total line, you'd have to read all the output of wc -l, and remove the final line only if the total length of the output is greater than 1. (Even that can fail if you have files with newlines in their names, but you can probably ignore that possibility.)

A more reliable method is to invoke wc -l on each file individually, avoiding the total line:

for file in $directory-path/*.txt ; do wc -l "$file" ; done

And if you want to sort the output (something you mentioned in a comment but not in your question):

for file in $directory-path/*.txt ; do wc -l "$file" ; done | sort -rn

If you happen to know that there are no files named total, a quick-and-dirty method is:

wc -l $directory-path/*.txt | grep -v ' total$'

If you want to run wc -l on all the files and then filter out the total line, here's a bash script that should do the job. Adjust the *.txt as needed.

#!/bin/bash

wc -l *.txt > .wc.out
lines=$(wc -l < .wc.out)
if [[ lines -eq 1 ]] ; then
    cat .wc.out
else
    (( lines-- ))
    head -n $lines .wc.out
fi
rm .wc.out

Another option is this Perl one-liner:

wc -l *.txt | perl -e '@lines = <>; pop @lines if scalar @lines > 1; print @lines'

@lines = <> slurps all the input into an array of strings. pop @lines discards the last line if there are more than one, i.e., if the last line is the total line.

7
V. Michel On

The program wc, always displays the total when they are two or more than two files ( fragment of wc.c):

if (argc > 2)
     report ("total", total_ccount, total_wcount, total_lcount);
   return 0;

also the easiest is to use wc with only one file and find present - one after the other - the file to wc:

find $dir -name '*.txt' -exec wc -l {} \;

Or as specified by liborm.

dir="."
find $dir -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'
1
b_squared On

This is a job tailor-made for head:

wc -l | head --lines=-1

This way, you can still run in one process.

0
Ivan Zarea On

Similar to Mark Setchell's answer you can also use xargs with an explicit separator:

ls | xargs -I% wc -l %

Then xargs explicitly doesn't send all the inputs to wc, but one operand line at a time.

0
Allen Supynuk On

Shortest answer:

ls | xargs -l wc
0
jimjam100 On

What about using sed with the pattern removal option as below which would only remove the total line if it is present (but also any files with total in them).

wc -l $directory-path/*.txt | sort -rn | sed '/total/d'

0
MaratC On

While most of the answers center around removing the unneeded line, or using a version of wc that allows suppressing it, there's something to be said in favor of never producing it in the first place.

So you want to count lines in $directory-path/*.txt files, however feeding several files to wc will produce the total — which you don't want.

I would change your pipeline to find the files and feeding them to wc one by one, in this manner:

find $directory-path -name "*.txt" | xargs -L 1 wc -l | sort -rn

In this case, find is tasked with locating files, while xargs -L 1 is tasked with feeding them to wc one by one.