Return PostgreSQL hstore as OrderedDict using psycopg2 in Python

3.2k views Asked by At

A PostgreSQL hstore maintains order between storage and retrieval. This allows one to define the order in which the keys/values within the hstore are stored and retrieved.

Unfortunately, psycopg2's implementation uses a hard-coded dict under extras.py -> HstoreAdapter.parse().

While in most scenarios parsing an hstore to a dict suites just fine, in our scenario this causes problems; we specifically need to maintain ordering.

One work-around I've derived is querying for keys and values as separate lists:

SELECT AKEYS(myhstore) AS keys, AVALS(mystoore) AS vals FROM mytbl

... However that solution treats the symptom of the problem rather than the cause.

Is anyone aware of a monkey-patch solution to this issue, or a branch of psycopg2 which deals with this issue, or of an implementation of the extras.py file which resolves this?

Alternately, does anyone have any other suggestions on how to handle this?

2

There are 2 answers

0
Uyghur Lives Matter On BEST ANSWER

NOTICE: HSTORE does not preserve order. It is unordered like Python's dict. My previous answer only worked by chance.

The order of the pairs is not significant (and may not be reproduced on output).

Example:

SELECT 'foo=>1,bar=>2,spam=>3,eggs=>4'::hstore
UNION ALL
SELECT hstore('{foo,1,bar,2,spam,3,eggs,4}'::TEXT[])
UNION ALL
SELECT hstore('{{foo,1},{bar,2},{spam,3},{eggs,4}}'::TEXT[])
UNION ALL
SELECT hstore('{foo,bar,spam,eggs}'::TEXT[], '{1,2,3,4}'::TEXT[])

All result in:

""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""
""bar"=>"2", "foo"=>"1", "eggs"=>"4", "spam"=>"3""

Which appears to be unordered because its order is different from the original order, and it is not alphabetically ordered.


THE BELOW DOES NOT ACTUALLY WORK!

You can maintain order in an HSTORE by using the hstore_to_matrix() function which converts the HSTORE to an array of key-value pairs. Then you have to manually pass it to OrderedDict in Python:

import collections
import psycopg2
import psycopg2.extras

pg = psycopg2.connect(...)
psycopg2.extras.register_hstore(pg)
cursor = pg.cursor()

cursor.execute("""
    SELECT hstore_to_matrix('a=>1,b=>2,c=>3'::hstore);
""")
pairs = cursor.fetchone()[0]
ordered = collections.OrderedDict(pairs)

print(pairs)
# [['a', '1'], ['b', '2'], ['c', '3']]
print(ordered)
# OrderedDict([('a', '1'), ('b', '2'), ('c', '3')])

psycopg2 ultimately calls .keys() and .values() on the dictionary when its converted back to an HSTORE which means so long as the dictionary is ordered, the HSTORE sent back to PostgreSQL will also be ordered. You just have to pass back an OrderedDict instead of a regular dict to maintain order:

# This will save the data as ordered.
data = OrderedDict([('a', '1'), ('b', '2'), ('c', '3')])
cursor.update("""
    UPDATE mytable
    SET ordered = %(data)s;
""", {
    'data': data
})

# This will save the data as unordered. Whatever the internal order
# happens to be for the dict will be sent. When I run it in the python
# interpreter, it results in:
#   [('a', '1'), ('c', '3'), ('b', '2')]
cursor.update("""
    UPDATE mytable
    SET ordered = %(data)s;
""", {
    'data': data
})
2
jpwagner On

you could dump to string and store as text

then you could specify object_pairs_hook in the json decoder: http://docs.python.org/2/library/json.html#encoders-and-decoders

In [1]: json.JSONDecoder(object_pairs_hook=collections.OrderedDict).decode('{"foo":1, "bar": 2}')
Out[1]: OrderedDict([('foo', 1), ('bar', 2)])