Using Gawk and Printf in a Bash script

66 views Asked by At

I am trying to separate a file into smaller files with gawk and rename the smaller files in order from the original file.

for i in *.txt 
do
gawk -v RS="START_of_LINE_to_SEPARATE" 'NF{ print RS$0 > "new_file_"++n".txt"}' $i
done

The output gives me: new_file_1.txt new_file_2.txt ect...

I would like the output to be: new_file_0001.txt new_file_0002.txt ect...

2

There are 2 answers

0
Tom Fenech On BEST ANSWER

Ignoring the issue of the outer loop and focusing on the awk part of the question, you can use sprintf to produce your filename:

gawk -v RS="START_of_LINE_to_SEPARATE" 'NF{ file = sprintf("new_file_%04d.txt", ++n) 
                                            print RS$0 > file }' "$i"

The format specifier %04d means that the number is a digit, padded to length 4 with leading zeros.

If you want to go through all the .txt files and keep incrementing the counter, then you can get rid of the loop and pass them all to awk at once by changing "$i" to *.txt.

0
anubhava On

You can do:

for i in *.txt; do 
    printf -v num "%04d" $((++n))
    gawk -v num="$num" -v RS="START_of_LINE_to_SEPARATE" 'NF{
       print RS$0 > "new_file_" num ".txt"}' "$i"
done