Select all lines after last occurrence of a certain character

74 views Asked by At
TITLE
——
1 nuit 
1 nuit 
1 nuit 
1 nuit 
1 nuit 
total : 5 nuits 
——
1 nuit 
1 nuit
1 nuit
total : 3 nuits 
——
1 nuit 
1 nuit
1 nuit
1 nuit
1 nuit
1 nuit

and so on ...

I'm having this paragraph in which I'd like to select the last lines after the occurrence of the last ——. It should match and group the 6 following lines right after the ——... I've tried pretty much everything that crossed my mind so far but I must be missing something here. I tested (?s:.*\s)\K—— that is able to match the last —— of the document. But I can't seem to be able to select the lines after that match.

The point here is to count the lines after that. So if I'm only able to select the "1" or "nuit" that's fine.

The expected capture:

1 nuit 
1 nuit
1 nuit
1 nuit
1 nuit
1 nuit
3

There are 3 answers

1
The fourth bird On BEST ANSWER

You could write the pattern like this if there is no single dangling —— at the bottom, but that would give you a single match of all the lines:

(?s).*^——$\R\K.*$

Regex demo

If you want separate matches (to count them), you could write the pattern as:

(?:[\s\S]*^——$|\G(?!^))\R\K.+

The pattern matches:

  • (?: Non capture group
    • [\s\S]* Match all the way to the end of the file
    • ^——$ Match —— on a single line
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, not at the start
  • ) Close the non capture group
  • \R\K Match a newline and then forget what is matched so far
  • .+ Match a whole line with one or more characters

Regex demo

Or if there can not be a single —— at the bottom:

(?:[\s\S]*^——$|\G(?!^))\R\K(?!^——$).+

Regex demo

For example in a tool like Notepad++:

enter image description here

0
Luuk On

A solution using awk:

awk  "!/ /{ n=0 }/ /{ n++ }END{if(n!=0){ print \"total :\", n, \"nuits\";}}"  nuits.txt

When input is named nuits.txt, this will produce:

total : 6 nuits

NOTE: I am testing for a space in the lines, because the UTF-8 character is hard to match on a command-line in Windows.

0
ArtyLee On

this would be a solution in Python. It reads the file from input.txt and counts the occurrances between the character "——". Then it writes the number of lines found per block to output.txt. I hope this meets your needs.

'''
parses the file input.txt in the same path
as this python file
and counts the occurrences of a keyword2 (kw2) 
in blocks around kw1
'''
import pathlib
import re
kw1 = "——"
kw2 = "^[\d]+ nuit"

def write_to_output(res_kw):
    fout = open("output.txt", 'a')
    for cnt in res_kw:
        fout.write(str(cnt) + "\n")
    fout.close()

path = pathlib.Path(__file__).parent.resolve()
print(path)
finput = open("input.txt", mode="r", encoding="utf-8")
lines = finput.readlines()
block = []
for i in range(len(lines)):
    if re.search(kw1, lines[i]):
        block.append(i)

nuit = [0]*(len(block)-1)

i = 1
while i < len(block):
    start = block[i - 1]
    end = block[i]
    for line in lines[start : end]:
        if re.search(kw2, line):
            nuit[i-1] += 1
        else:
            pass
    start = end
    i += 1


write_to_output(nuit)

The output of the code would be the number of lines with "x nuit" without "total ...": 5 3 6