Extract a html tag that contains a string in openrefine?

Question

Extract a html tag that contains a string in openrefine?

589 views Asked by treakec At 13 June 2015 at 09:19

There is not much to add to the title. It's what i'm trying to do. Any suggestions?

I reviewed the docs at github and googled extensively.

The best i got is:

value.parseHtml().select('p[contains('xyz')]')

It results in a syntax error.

Original Q&A

There are 2 answers

**Owen Stephens** · Answer 1 · 2015-06-13T09:44:11+00:00

Owen Stephens On 13 June 2015 at 09:44

The 'select' syntax is based on the select syntax in Beautiful Soup (http://jsoup.org/cookbook/extracting-data/selector-syntax)

In this case I believe the syntax you need is:

value.parseHtml().select("p:contains(xyz)")

Owen

**Thad Guidry** · Answer 2 · 2015-06-15T01:58:53+00:00

Perhaps you missed my writeup (and WARNING) on the wiki :) here ?

https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML#extract-html-attributes-text-links-with-integrated-grel-jsoup-commands

WARNING: Make sure to use .toString() suffixes when needed to output strings into Refine cells while working with the built-in HTML GREL commands (the default output is org.jsoup.nodes objects). Otherwise you'll get a preview just fine in the Expression Editor, BUT no data shown in the Refine cells when you apply it!

BTW, How could we make the docs better and where, so that someone doesn't miss this in the future ?

I even gave folks a nice example in our docs that shows using .toString() : https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#selectelement-e-string-s

TechQA.

Extract a html tag that contains a string in openrefine?

There are 2 answers

Related Questions in HTML

Related Questions in OPENREFINE

Popular Questions

Popular Tags

Trending Questions