List of unique headers recursively on files matching pattern

105 views Asked by At

I want the unique headers for a bunch of csv files whose names contain ABC or XYZ.

Within a single directory, I can sort of get what I need with:

head -n ` *.csv > first.txt
cat -A first.txt | tr ',' '\n' | sort | uniq

Of course, this isn't recursive and it includes all csv files, not just the ones I want.

If I do the following, I get the recursive search, but also a bunch of junk:

find . -type f -name "ABC*.csv" -o -name "XYZ*.csv" | xargs head -n 1 | tr ',' '\n' | sort | uniq

I'm on Windows 10 with MinGW64. I suppose I could use Python, but I feel so close to having it!

1

There are 1 answers

6
oguz ismail On BEST ANSWER

When head is given multiple files (xargs does that) it prints their names as well.

Using find's -exec action (you should force the precedence of -name 'ABC*.csv' -o -name 'XYZ*.csv for it to work) you can obtain the desired result. uniq is also not required here, sort can do that on its own. And as a sidenote, you better enclose literal strings in single quotes.

find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr ',' '\n' | sort -u

If your files have DOS line endings above command will not work though. In that case you should delete carriage returns using tr or sed:

find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr -d '\r' | tr ',' '\n' | sort -u
# or
find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | sed 's/\r//; s/,/\n/g' | sort -u