R: Remove end of text after matching string

Question

R: Remove end of text after matching string

180 views Asked by Francis Smart At 21 June 2015 at 23:49

I would like to remove any text that appears after a certain character match either THE END or FINIS. I know this is very similar to this other topic, but I am just not skilled enough in regex to make this work for me.

My text is Shakespear books taken from Project Gutenberg. They typically look something like

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  THE END   <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

or

txt <- "... thou hast tam'd a curst shrow.   LUCENTIO. 'Tis a wonder, 
  by your leave, she will be tam'd so. Exeunt  FINIS  <<THIS ELECTRONIC  VERSION OF THE 
  COMPLETE WORKS OF WILLIAM ..."

My ideal would look something like gsub("^[THE END]*|^[FINIS]*", "", txt) returning "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt

Original Q&A

There are 1 answers

**Federico Piazza** · Accepted Answer · 2015-06-22T00:03:03+00:00

Federico Piazza On 22 June 2015 at 00:03 BEST ANSWER

You are pretty close to do it, you have to use:

gsub("(THE END|FINIS).*", "", txt)

Working demo

Btw, as thelatemail pointed in his comment with sub would be enough for one replacement.

TechQA.

R: Remove end of text after matching string

There are 1 answers

Related Questions in REGEX

Related Questions in R

Related Questions in GSUB

Popular Questions

Popular Tags

Trending Questions