Regex Split at beginning of line containing word

193 views Asked by At

I'm trying to split a text into paragraphs each time a line contains a certain word. I already managed to split the text at the beginning of that word, but not at the beginning of the line containing that word. what's the right expression?

this is what I have

 string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");

I also want to lose any empty paragraphs in the array.

this is the input

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION INT. - NIGHT

and I want to split it up keeping the same layout but just in paragraphs.

The result I have is

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - 

EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION 

INT. - NIGHT

The new paragraphs start at the word and not at the line.

This is the desired result

Paragraph 1

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

Paragraph 2

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

Paragraph 3

LOCATION INT. - NIGHT

The paragraph should always start at the beginning of the line containing the word INT. or EXT. not at the word.

1

There are 1 answers

0
Nader Hisham On BEST ANSWER
Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

check this text scenario

string text = "INT. LOCATION - DAY\n" +
                "Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
                "LOCATION - EXT.\n" +
                "Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
                "LOCATION INT. - NIGHT\n";

            string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

            for (int i = 0; i < res.Count(); i++)
            {
                int lineNumber = i + 1;   
                Console.WriteLine("paragraph " + lineNumber + "\n"  + res[i]);
            }


#paragraph 1
#INT. LOCATION - DAY
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.

#paragraph 2
#LOCATION - EXT.
#Morbi cursus dictum tempor. Phasellus mattis at massa non porta.

#paragraph 3
#LOCATION INT. - NIGHT