Is there some sequence of tags that could possibly indicate a title among a webpage? For example, extracting the title of the book from its amazon page, where other text/sentences may have similar sentence structures. I feel like this is an extremely fundamental task but cannot figure out exactly how to do it with Stanford's NER/CoreNLP.
Thanks in advance!
A solution without using the CoreNLP library - If you are looking for a title on a webpage, why not parse the
<title>
tag?For example, the title for the amazon book page for the Hunger Games (http://www.amazon.com/Hunger-Games-Trilogy-Boxset/dp/0545626382/ref=sr_1_2?s=books&ie=UTF8&qid=1386299491&sr=1-2&keywords=hunger+games) is:
Of course, title tags depend on the website, and they can either relate to the page or just be generically the title of the overarching website.