My table in a PostgreSQL database stores a large number of integer arrays in the column called vector
. Since all the numbers fit into the range from -128 to 127 and to save space, I use the bytea
type. I need to use the dot product distance operator <#>
provided by the pgvector
extension, e.g.:
select id
from MyTable
order by vector<#>key
limit 10;
However, this operator does not support the bytea
type. I want to convert my vectors to int[]
, which can then be cast to the vector
type supported by pgvector
.
I think the general direction for converting bytea
to int[]
might be along the lines of:
SELECT string_to_array(encode(vector, 'escape'), '\\')::int[]
FROM MyTable;
How do I modify this to make the conversion work?
You can join with
generate_series()
and use theget_byte()
function:This is going to be slow and inefficient, but that is to be expected if you try to save space at all costs.
I would recommend to use a
smallint[]
, which uses 2 bytes per number.If you are running PostgreSQL v15 or later, you could also use an array of the internal data type
"char"
, which takes only one byte per element:To get an individual element, you use
A
"char"[]
uses one byte per element plus an overhead of 21 bytes per array (abytea
has an overhead of one or four bytes per value, depending on the size).