What encoding does the ScraperWiki datastore expect?

470 views Asked by At

While writing a scraper on ScraperWiki, I was repeatedly getting this message when trying to save a UTF8-encoded string:

UnicodeDecodeError('utf8', ' the \xe2...', 49, 52, 'invalid data')

I eventually worked out, by trial and UnicodeDecodeError, that the ScraperWiki datastore seems to expect Unicode.

So I'm now decoding from UTF-8 and converting everything to Unicode immediately before saving to the datastore:

    try:
         for k, v in record.items():
             record[k] = unicode(v.decode('utf-8'))
    except UnicodeDecodeError:
        print "Record %s, %s has encoding error" % (k,v)
    scraperwiki.datastore.save(unique_keys=["ref_no"], data=record)

This avoids the error, but is it sensible? Can anyone confirm what encoding the ScraperWiki datastore supports?

Thanks!

1

There are 1 answers

0
frabcus On

The datastore requires either UTF-8 byte strings or Unicode strings.

This example show both ways of saving a pounds sterling currency sign in Python:

http://scraperwiki.com/scrapers/unicode_test/

The same applies in other languages.

You can, for debugging purposes, print non-UTF-8/Unicode strings to the console, and characters it doesn't understand are stripped.