Linux: Pipe `find` text file list | `dos2unix` | `dd` command

2.7k views Asked by At

What I'm attempting to do:

  • Line 1: find any .txt or .TXT file and pipe them into next command
  • Line 2: convert the .txt file to unix format (get rid of Windows line endings)
  • Line 3: delete the last line of the file, which is always blank
find "${TEMPDIR}" -name *.[Tt][Xx][Tt] | /
xargs dos2unix -k | /
dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )

I can't pipe the (EDIT output) filename of xargs dos2unix -k | / into the third line, I get the following error:

stat: cannot stat '': No such file or directory
tail: cannot open '' for reading: No such file or directory
dd: failed to open '': No such file or directory

Clearly Iv'e wrongly assumed that "$_" will be enough to pass the output through the pipe.

How can I pipe the output (a text file) from xargs dos2unix -k into the third line, dd if=/dev/null of="$_" bs=1 seek=$(echo $(stat --format=%s "$_" ) - $( tail -n1 "$_" | wc -c) | bc )?

The solution for line 3 comes from an answer to another question on SO about removing the last line from a file, with this answer in particular being touted as a good solution for large files: https://stackoverflow.com/a/17794626/893766

3

There are 3 answers

7
anishsane On BEST ANSWER

Can this help?

find "${TEMPDIR}" -iname '*.txt' -exec dos2unix "{}" \; -exec sed -i '$d' "{}" \;
1
ColOfAbRiX On

You can try to substitute dos2unix with an explicit replace:

find "${TEMPDIR}" -iname '*.txt' -exec cat {} \; |
tr -d '\r' |
...

As the windows for new line is \r\n you remove all the occurrences of \r with the command tr.

About the find command you can use the option -iname for case-insensitive search and the -exec to run a command.

0
tripleee On

If the file is really big, you are already messing up the efficiency by rewriting it with tr. Then, you are reading it a second time with tail just to get the index of the last line.

The least inefficient fix I can come up with is to replace dos2unix and dd with just one command which performs both functions, so you only read and write the output file once.

find "$TMPDIR" -iname '*.txt' -exec perl -i -ne '
    print $line if defined $line; ($line = $_) =~ s/\015$//' {} \;

Your attempt to use $_ for the current file name doesn't work. The value of $_ is the last file name used by the previous completed command; but in the middle of a pipeline, nothing is yet completed. One possible workaround (which I include only for illustration, not as a recommended solution) would be to run everything in xargs where you have access to {}, similarly to how it works in find -exec.

find "$TMPDIR" -iname '*.txt' -print0 |
xargs -r0 sh -c 'dos2unix -k "{}"
    if=/dev/null of="{}" bs=1 seek=$(
        echo $(stat --format=%s "{}" ) - $( tail -n1 "{}" | wc -c) | bc)

I added -print0 and the corresponding xargs -0 as well as xargs -r as illustrations of good form; though the zero-terminated text format is a GNU find extension not generally found on other platforms.

(Privately, I would probably also replace the seek calculation with a simple Awk script, rather than expend three processes on performing a subtraction.)