Convert bytea to ndarray of nd.float32

75 views Asked by At

I have an ndarray of np.float32 that is saved in a Postgres database in the bytea format:

import pandas as pd
import numpy as np
import sqlite3

myndarray=np.array([-3.55219245e-02, 1.33227497e-01, -4.96977456e-02, 2.16857344e-01], dtype=np.float32)
myarray=[myndarray.tobytes()]
mydataframe=pd.DataFrame(myarray, columns=['Column1'])
mydataframe.to_sql('mytable', sqlite3.connect("/tmp/floats.sqlite"))

In SQLITE3, this will produce:

CREATE TABLE IF NOT EXISTS "mytable" ("index" INTEGER, "Column1" TEXT);
INSERT INTO mytable VALUES(0,X'707f11bdca6c083edd8f4bbdda0f5e3e');

In Postgresql, this will produce:

mydatabase=# select * from mytable;
 index |              Column1
-------+------------------------------------
     0 | \x707f11bdca6c083edd8f4bbdda0f5e3e

Which format is bytea. How to convert that \x707f... back to myndarray? No expert here, I've found a lot of obscure documentation about frombuffer(), python2 buffer(), memoryview() but I am far from a proper result.

My best so far is:

np.frombuffer(bytearray('707f11bdca6c083edd8f4bbdda0f5e3e', 'utf-8'), dtype=np.float32)

which is completely wrong (myndarray has 4 values):

[2.1627062e+23 1.6690035e+22 3.3643249e+21 5.2896255e+22 2.1769183e+23
 1.6704162e+22 2.0823326e+23 5.2948159e+22]
1

There are 1 answers

0
RodolfoAP On

After a lot of trial and error (repeat, I don't know python), I've found a solution.

ndarray=np.frombuffer(bytes.fromhex("707f11bdca6c083edd8f4bbdda0f5e3e"), np.float32)

print(ndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

print(type(ndarray))
# <class 'numpy.ndarray'>

print(type(ndarray[0]))
# <class 'numpy.float32'>

Now, the full example:

ngine=create_engine('postgresql://postgres:mypassword@localhost/mydatabase')
myndarray=np.array([-3.55219245e-02, 1.33227497e-01, -4.96977456e-02, 2.16857344e-01], dtype=np.float32)

print(myndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

myarray=[myndarray.tobytes()]
mydataframe=pd.DataFrame(myarray, columns=['Column1'])
mydataframe.to_sql('mytable', ngine)
# SELECT * FROM mytable
#
#index  Column1
#    0  \x707f11bdca6c083edd8f4bbdda0f5e3e

bytea=pd.read_sql(sql=select(mytable), con=ngine).iloc[0]['Column1']
print(type(bytea))
# <class 'str'>

print(bytea)
# \x707f11bdca6c083edd8f4bbdda0f5e3e

print(bytea[2:])
# 707f11bdca6c083edd8f4bbdda0f5e3e
# Surprise! The \x is not interpreted! 

ndarray=np.frombuffer(bytes.fromhex(bytea[2:]), np.float32)
print(ndarray)
# [-0.03552192  0.1332275  -0.04969775  0.21685734]

Thanks, @hpaulj, @PranavHosangadi @juanpa.arrivillaga, @ACarter.