How to unnest a 2d array into a 1d array quickly in PostgreSQL?

2.6k views Asked by At

I have a really large array that have I computed with Apache Madlib and I would like to apply an operation to each single array in that 2d array.

I have found code that can help me unnest it from this related answer. However, the code is miserably slow on this really large 2d array (150,000+ 1d float arrays). While unnest() only takes a few seconds to run, even after waiting for several minutes the code has not completed.

Surely, there must be a faster way to unnest the large 2d array into smaller 1d arrays? Bonus point if that solution uses Apache Madlib. I did find one lead buried in the documentation called deconstruct_2d_array, however, when I try to call that function on the matrix, it fails with the following error:

ERROR: Function "deconstruct_2d_array(double precision[])": Invalid type conversion. Internal composite type has more elements than backend composite type.

2

There are 2 answers

2
Erwin Brandstetter On BEST ANSWER

The function you found in my old answer does not scale well for big arrays. I never thought of arrays your size, which should probably be a set (a table) instead.

Be that as it may, this PL/pgSQL function can replace the one in the referenced answer. Requires Postgres 9.1 or later.

CREATE OR REPLACE FUNCTION unnest_2d_1d(ANYARRAY, OUT a ANYARRAY)
  RETURNS SETOF ANYARRAY
  LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
   FOREACH a SLICE 1 IN ARRAY $1 LOOP
      RETURN NEXT;
   END LOOP;
END
$func$;

40x faster in my test on a big 2d-array in Postgres 9.6.

STRICT to avoid an exception for NULL input (as commented by IamIC):

ERROR: FOREACH expression must not be null

1
Frank McQuillan On

There is now a built-in MADlib function to do this - array_unnest_2d_to_1d, which was introduced in the 1.11 release: http://madlib.incubator.apache.org/docs/latest/array__ops_8sql__in.html#af057b589f2a2cb1095caa99feaeb3d70

Here is an example usage:

CREATE TABLE test1 (pid int, points double precision[]);
INSERT INTO test1 VALUES
(100,  '{{1.0, 2.0, 3.0}, {4.0, 5.0, 6.0}, {7.0, 8.0, 9.0}}'),
(101,  '{{11.0, 12.0, 13.0}, {14.0, 15.0, 16.0}, {17.0, 18.0, 19.0}}'),
(102,  '{{21.0, 22.0, 23.0}, {24.0, 25.0, 26.0}, {27.0, 28.0, 29.0}}');
SELECT * FROM test1;

produces

 pid |               points               
-----+------------------------------------
 100 | {{1,2,3},{4,5,6},{7,8,9}}
 101 | {{11,12,13},{14,15,16},{17,18,19}}
 102 | {{21,22,23},{24,25,26},{27,28,29}}
(3 rows)

Then call the unnest function:

SELECT pid, (madlib.array_unnest_2d_to_1d(points)).* 
FROM test1 ORDER BY pid, unnest_row_id;

produces

pid | unnest_row_id | unnest_result 
-----+---------------+---------------
 100 |             1 | {1,2,3}
 100 |             2 | {4,5,6}
 100 |             3 | {7,8,9}
 101 |             1 | {11,12,13}
 101 |             2 | {14,15,16}
 101 |             3 | {17,18,19}
 102 |             1 | {21,22,23}
 102 |             2 | {24,25,26}
 102 |             3 | {27,28,29}
(9 rows)

where unnest_row_id is an index into the 2D array