Faster bash writing to file

2.8k views Asked by At

I'm reading a file in bash, taking the values out and saving them to another file. The file has ~100k lines in it, and it takes around 25minutes to read and rewrite them all.

Is there maybe some faster way to write to a file, because now I'm just iterating through the lines, parsing some values and saving them like this:

while read line; do
   zip="$(echo "$line" | cut -c 1-8)"
   echo $zip
done < file_one.txt

Everything works fine, the values are parsed correctly, I just want to know how can I optimize the process (if I even can).

Thanks

4

There are 4 answers

5
Petr Skocik On BEST ANSWER

The bash loop only slows it down (especially the part where you invoke an external program (cut) once per iteration). You can do all of it in one cut:

cut -c 1-8 file_one.xt
0
John B On

If you wish to act on a line's substring if it meets some condition, Awk is built for manipulating text files:

awk '{zip=substr($0, 1, 8)} zip == "my match" {print zip}' file_one.txt

In this example substr($0, 1, 8) represents characters 1 through 8 of each line record ($0) of file_one.txt. These substrings are assigned to the zip variable, and only print when matching the text "my match".

If you're unfamiliar with Awk, and routinely have large files needing to be manipulated, I recommend investing some time to learn it. Awk is loads faster and more efficient than bash read loops. The blog post - Awk in 20 Minutes - is a good, quick introduction.

To shave even more time off on large files, you can use an optimized for speed version of Awk called Mawk.

0
chepner On

Calling cut once for each line is a big bottle neck. Use substring expansion instead to grab the first 8 characters of each line.

while read line; do
   zip=${line:0:8}
   echo $zip
done < file_one.txt
0
Ethan A. On

I would go with this, since it only executes the cut once:

while read line; do
   echo $line
done < <(cut -c 1-8 file_one.txt)