Split file after n number of non consecutivempty lines

108 views Asked by At

I am trying to split a big text files after n number of empty lines. The text file contains exactly one empty line as data separator. Like below:

Lorem ipsum
Lorem ipsum
Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem
Lorem

...

I have tried to use csplit

csplit data.txt /^$/ {3}

My expectation is that after 3 empty lines (not consecutive, but after cursor processes 3 empty lines) it split file and continue to do so. But it actualy splits file in each empty line.

My expected files: xx00

Lorem ipsum
Lorem ipsum
Lorem ipsum

Lorem ipsum
Lorem ipsum

Lorem ipsum

xx01

Lorem ipsum
Lorem ipsum

Lorem
Lorem

Any suggestion?

4

There are 4 answers

0
Renaud Pacalet On BEST ANSWER

With awk (tested with GNU and BSD awk):

awk -v max=3 '{print > sprintf("xx%02d", int(n/max))} /^$/ {n += 1}' file
0
RARE Kpop Manifesto On
removed './xx00'
removed './xx01'
removed './awkprof.out'

    {m,g}awk '{
        print >> sprintf("xx%0*.f%.*s", __-(_~_),
                 int(_/__),_<_,_+=!NF) }' FS='^$' __=3

-rw-r--r--  1 501  75 Jun  8 09:19:10 2022 xx00
-rw-r--r--  1 501  37 Jun  8 09:19:10 2022 xx01


../../Desktop/testdiremptylines/

     1  Lorem ipsum
     2  Lorem ipsum
     3  Lorem ipsum
     4  
     5  Lorem ipsum
     6  Lorem ipsum
     7  
     8  Lorem ipsum
     9  

 xx00

     1  Lorem ipsum
     2  Lorem ipsum
     3  
     4  Lorem
     5  Lorem

 xx01
0
anubhava On

This awk should also work with an empty RS:

awk -v n=3 -v RS= '{ORS=RT; print > sprintf("xx%02d", int((NR-1)/n))}' file
0
dan On

awk is good for this.

Split every n empty lines, naming files with:

No leading zeroes:

awk -v n=3 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f}'

width minimum width/zeroes:

awk -v n=3 -v width=2 '
$0 == "" {++c}
c <= n {print > "xx"f}
c==n {c=0; ++f; f = sprintf("%0*d",width,f)}'

To remove the trailing empty line in each file, just change c <= n to c < n.