What encoding does the ScraperWiki datastore expect?

Question

What encoding does the ScraperWiki datastore expect?

486 views Asked by AP257 At 13 February 2011 at 14:34

While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string:

UnicodeDecodeError('utf8', ' the \xe2...', 49, 52, 'invalid data')

I eventually worked out, by trial and UnicodeDecodeError, that the ScraperWiki datastore seems to expect Unicode.

So I'm now decoding from UTF-8 and converting everything to Unicode immediately before saving to the datastore:

    try:
         for k, v in record.items():
             record[k] = unicode(v.decode('utf-8'))
    except UnicodeDecodeError:
        print "Record %s, %s has encoding error" % (k,v)
    scraperwiki.datastore.save(unique_keys=["ref_no"], data=record)

This avoids the error, but is it sensible? Can anyone confirm what encoding the ScraperWiki datastore supports?

Thanks!

Original Q&A

There are 1 answers

**frabcus** · Answer 1 · 2011-02-14T09:32:14+00:00

The datastore requires either UTF-8 byte strings or Unicode strings.

This example show both ways of saving a pounds sterling currency sign in Python:

http://scraperwiki.com/scrapers/unicode_test/

The same applies in other languages.

You can, for debugging purposes, print non-UTF-8/Unicode strings to the console, and characters it doesn't understand are stripped.

TechQA.

What encoding does the ScraperWiki datastore expect?

There are 1 answers

Related Questions in PYTHON

Related Questions in SCREEN-SCRAPING

Related Questions in SCRAPERWIKI

Popular Questions

Popular Tags

Trending Questions