InstaPaper API - /api/1/bookmarks/get_text

649 views Asked by At

I'm working the InstaPaper API

I'm using this string to pull the content of the article.

$Bookmark_Text = $connection->getBookmarkText($Bookmark['bookmark_id']);

Unfortunately it is pulling the entire html and basically putting the HTML structure in my HTML.

Example.

<html>
<head></head>
<body>
    <html>
    <head>Instapaper Title</head>
    <body>InstaPaper Article Content</body>
    </html>
</body>
</html>

Any thoughts on how to just get "Instapaper article content"

Thanks!

2

There are 2 answers

3
freejosh On

Use a parser to extract the contents of <body>. PHP has some built in, but there are others out there which might be easier to use.

This should do it if $Bookmark_Text is a valid HTML document.

$dom = new DOMDocument();
$dom->loadHTML($Bookmark_Text);
$body = $dom->getElementsByTagName('body')->item(0);
$content = $body->ownerDocument->saveHTML($body);
0
drkbrd On

Here’s some JS code that extracts only the article and removes Instapaper’s stuff (top and bottom bar for example).

html.replace(/^[\s\S]*<div id="story">|<\/div>[^<]*<div class="bar bottom">[\s\S]*$/gim, '');

Be aware that it may change as Instapaper’s HTML output changes.