Editing the last instance in a file

334 views Asked by At

I have a huge text file (~1.5GB) with numerous lines ending with ".Ends".
I need a linux oneliner (perl\ awk\ sed) to find the last place '.Ends' appear in the file and add a couple of lines before it.

I tried using tac twice, and stumbled with my perl:

When I use:
tac ../../test | perl -pi -e 'BEGIN {$flag = 1} if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It first prints the "someline\n" and only than prints the .Ends The result is:

.Ends
someline

When I use:
tac ../../test | perl -e 'BEGIN {$flag = 1} print ; if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It doesn’t print anything.

And when I use:
tac ../../test | perl -p -e 'BEGIN {$flag = 1} print $_ ; if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac
It prints everything twice:

.Ends
someline
.Ends

Is there a smooth way to perform this edit?
Don't have to be with my solution direction, I'm not picky...
Bonus - if the lines can come from a different file, it would be great (but really not a must)

Edit
test input file:

gla2 
fla3 
dla4 
rfa5 
.Ends
shu
sha
she
.Ends
res
pes
ges
.Ends  
--->
...
pes
ges
someline
.Ends  
# * some irrelevant junk * #
7

There are 7 answers

9
markp-fuso On BEST ANSWER

Inputs:

$ cat test.dat
dla4
.Ends
she
.Ends
res
.Ends
abc

$ cat new.dat
newline 111
newline 222

One awk idea that sticks with OP's tac | <process> | tac approach:

$ tac test.dat | awk -v new_dat="new.dat" '1;/\.Ends/ && !(seen++) {system("tac " new_dat)}' | tac
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc

Another awk idea that replaces the dual tac calls with a dual-pass of the input file:

$ awk -v new_dat="new.dat" 'FNR==NR { if ($0 ~ /\.Ends/) lastline=FNR; next} FNR==lastline { system("cat "new_dat) }; 1' test.dat test.dat
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc

NOTES:

  • both of these solutions write the modified data to stdout (same thing OP's current code does)
  • neither of these solutions modify the original input file (test.dat)
4
Andre Wildberg On

First let grep do the searching, then inject the lines with awk.

$ cat insert
new content
new content

$ line=$(cat insert)

$ awk -v var="${line}" '
      NR==1{last=$1; next} 
      FNR==last{print var}1' <(grep -n "^\.Ends$" file | cut -f 1 -d : | tail -1) file
rfa5 
.Ends
she
.Ends
ges
.Ends  
ges
new content
new content
.Ends
ges
ges

Data

$ cat file
rfa5 
.Ends
she
.Ends
ges
.Ends  
ges
.Ends
ges
ges
2
sseLtaH On

Using GNU sed, -i.bak will create a backup file with a .bak extension while saving the original file in-place

$ sed -Ezi.bak 's/(.*)(\.Ends)/\1newline\nnewline\n\2/' input_file
$ cat input_file
gla2
fla3
dla4
rfa5
.Ends
shu
sha
she
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
newline
newline
.Ends
6
markp-fuso On

Inputs:

$ cat test.dat
dla4
.Ends
she
.Ends
res
.Ends
abc

$ cat new.dat
newline 111
newline 222

One ed approach:

$ ed test.dat >/dev/null 2>&1 <<EOF
1
?.Ends
-1r new.dat
wq
EOF

Or as a one-liner:

$ ed test.dat < <(printf '%s\n' 1 ?.Ends '-1r new.dat' wq) >/dev/null 2>&1

Where:

  • >/dev/null 2>&1 - brute force suppression of diagnostic and info messages
  • 1 - go to line #1
  • ?.Ends - search backwards in file for string .Ends (ie, find last .Ends in file)
  • -1r new.dat - move back/up 1 line (-1) in file and read in the contents of new.dat
  • wq - write and quit (aka save and exit)

This generates:

$ cat test.dat
dla4
.Ends
she
.Ends
res
newline 111
newline 222
.Ends
abc

NOTE: unlike OP's current code which writes the modified data to stdout, this solution modifies the original input file (test.dat)

5
zdim On

If the last instance of that phrase is far enough down the file it helps performance greatly to process the file from the back, for example using File::ReadBackwards. This approach in fact helps in any case as we need to read only what is strictly necessary (the rest after the last instance of the phrase), and once.

Since you need to add other text to the file before the last marker then we have to copy the rest of it so to able to put it back after the addition.

use warnings;
use strict;
use feature 'say';
use Path::Tiny;
use File::ReadBackwards;
    
my $file = shift // die "Usage: $0 file\n"; 

my $bw = File::ReadBackwards->new($file);

my @rest_after_marker; 

while ( my $line = $bw->readline ) { 
    unshift @rest_after_marker, $line;
    last if $line =~ /\.Ends/;
}
# Position after which to add text and copy back the rest
my $pos = $bw->tell;    
$bw->close;

open my $fh, '+<', $file or die $!;    
seek $fh, $pos, 0;
truncate $fh, $pos;    
print $fh $_ for path("add.txt")->slurp, @rest_after_marker;

New text to add before the last .Ends is presumably in a file add.txt.

The question remains of how much of the file there is after the last .Ends marker? We copy all that in memory, to be able to write it back. If that is too much, copy it to a temporary file instead of memory, then use it from there and in the end remove that file.

2
Ed Morton On

Since you want to read the new lines from a file:

$ cat new
foo
bar
etc
$ tac file | awk 'NR==FNR{str=$0 ORS str; next} {print} $0==".Ends"{printf "%s", str; str=""}' new - | tac
gla2
fla3
dla4
rfa5
.Ends
shu
sha
she
.Ends
res
pes
ges
.Ends
--->
...
pes
ges
someline
foo
bar
etc
.Ends
# * some irrelevant junk * #

The above assumes the white space after .Ends on some lines of your posted sample input are a mistake. If they really can be present then change $0==".Ends" to /^\.Ends[[:space:]]*$/ or even /^[[:space:]]*\.Ends[[:space:]]*$/ if there might also be leading white space on those lines or just /\.Ends/ if there can be any chars before/after .Ends.

0
steffen On

Two general points in advance:

  1. When you pipe the output of perl to tac, it doesn't make sense to run perl -i for in-place edit.

  2. $flag is false by default. You can swap the meaning to make the code more handy:

    - BEGIN {$flag = 1} if ($flag==1 && /.Ends/) {$flag = 0 ; print "..."}
    + if (!$f && /.Ends/) {$f=1; print "..."}
    

Now to the questions:

When I use:

tac ../../test | perl -pi -e 'BEGIN {$flag = 1} if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac

It first prints the someline\n and only than prints the .Ends. The result is:.Ends\nsomeline.

Yes, because you're going backwards, the output is put after .Ends. You can inverse the output of the current line and the new line:

perl -pe 'if (!$f && /.Ends/) {$f=1 ; print $_ . "someline\n" ; $_=""}'

When I use:

tac ../../test | perl  -e 'BEGIN {$flag = 1} print ; if ($flag==1 && /.Ends/) {$flag = 0 ; print "someline\n"}' | tac

It doesn’t print anything.

You're just missing -n. It works.

perl -ne ...

[...] It prints everything twice:

No explanations needed for that :)

In general, using three commands is not a bad idea: You can avoid high memory usage by piping the perl output to a tmp file. Otherwise the second tac would need to keep the entire input in memory.

awk looks very similar:

tac test | awk '!f && $0==".Ends" {print $0 ORS "newline2" ORS "newline1"; f=1; next}1' | tac