How to strip a part of the text obtained from web harvest

122 views Asked by At

I am new to webharvest and am using it to get the article data from a website, using the following statement:

let $text := data($doc//div[@id="articleBody"])

and this is the data that I get from the above statement :

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people

Notable current and former residents of Pittstown include:

My question is that, is it possible to remove the entire content which is after "Notable people" using the configuration. Is it possible to do this way? If its possible please let me know how. Thanks.

Edit: The desired output:

The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army

Notable people
1

There are 1 answers

0
Navin Rawat On

you just need to change your let statement like:

let $text := substring-before(data($doc//div[@id="articleBody"]/text()), 'Notable people')

to get your desired output