Converting bytea to int[] in SQL in Postgres

171 views Asked by At

My table in a PostgreSQL database stores a large number of integer arrays in the column called vector. Since all the numbers fit into the range from -128 to 127 and to save space, I use the bytea type. I need to use the dot product distance operator <#> provided by the pgvector extension, e.g.:

select id
from MyTable
order by vector<#>key
limit 10;

However, this operator does not support the bytea type. I want to convert my vectors to int[], which can then be cast to the vector type supported by pgvector.

I think the general direction for converting bytea to int[] might be along the lines of:

SELECT string_to_array(encode(vector, 'escape'), '\\')::int[]
FROM MyTable;

How do I modify this to make the conversion work?

1

There are 1 answers

5
Laurenz Albe On BEST ANSWER

You can join with generate_series() and use the get_byte() function:

WITH b(b) AS (VALUES (BYTEA '\x55FF1200DEADBEEF'))
SELECT array_agg(get_byte(b.b, g.i) - 128)
FROM b
   CROSS JOIN LATERAL generate_series(0, length(b) - 1) AS g(i);

            array_agg             
══════════════════════════════════
 {-43,127,-110,-128,94,45,62,111}
(1 row)

This is going to be slow and inefficient, but that is to be expected if you try to save space at all costs.

I would recommend to use a smallint[], which uses 2 bytes per number.

If you are running PostgreSQL v15 or later, you could also use an array of the internal data type "char", which takes only one byte per element:

SELECT ARRAY[-43,127,-110,-128,94,45,62,111]::"char"[];

                 array                  
════════════════════════════════════════
 {"\\325",\x7F,"\\222","\\200",^,-,>,o}
(1 row)

To get an individual element, you use

SELECT (ARRAY[-43,127,-110,-128,94,45,62,111]::"char"[])[3]::integer;

 array 
═══════
  -110
(1 row)

A "char"[] uses one byte per element plus an overhead of 21 bytes per array (a bytea has an overhead of one or four bytes per value, depending on the size).