JSONB: more than one row returned by a subquery used as an expression

5.2k views Asked by At

I am (still) new to postgresql and jsonb. I am trying to select some records from a subquery and am stuck. My data column looks like this (jsonb):

{"people": [{"age": "50", "name": "Bob"}], "another_key": "no"}
{"people": [{"age": "73", "name": "Bob"}], "another_key": "yes"}

And here is my query. I want to select all names that are "Bob" whose age is greater than 30:

SELECT * FROM mytable
WHERE (SELECT (a->>'age')::float
       FROM (SELECT jsonb_array_elements(data->'people') as a
             FROM mytable) as b 
       WHERE a @> json_object(ARRAY['name', 'Bob'])::jsonb
      ) > 30;

I get the error:

more than one row returned by a subquery used as an expression

I don't quite understand. If I do some simple substitution (just for testing) I can do this:

SELECT * FROM mytable
WHERE  (50) > 30  -- 50 is the age of the youngest Bob

and that returns both rows.

3

There are 3 answers

2
Erwin Brandstetter On BEST ANSWER

The error means just what it says:

more than one row returned by a subquery used as an expression

The expression in the WHERE clause expects a single value (just like you substituted in your added test), but your subquery returns multiple rows. jsonb_array_elements() is a set-returning function.

Assuming this table definition:

CREATE TABLE mytable (
  id   serial PRIMARY KEY
, data jsonb
);

The JSON array for "people" wouldn't make sense if there couldn't be multiple persons inside. Your examples with only a single person are misleading. Some more revealing test data:

INSERT INTO mytable (data)
VALUES
  ('{"people": [{"age": "55", "name": "Bill"}], "another_key": "yes"}')
, ('{"people": [{"age": "73", "name": "Bob"}], "another_key": "yes"}')
, ('{"people": [{"age": "73", "name": "Bob"}
               ,{"age": "77", "name": "Udo"}], "another_key": "yes"}');

The third row has two people.

I suggest a query with a LATERAL join:

SELECT t.id, p.person
FROM   mytable t 
     , jsonb_array_elements(t.data->'people') p(person)  -- implicit LATERAL
WHERE (t.data->'people') @> '[{"name": "Bob"}]'
AND    p.person->>'name' = 'Bob'
AND   (p.person->>'age')::int > 30;

fiddle

The first WHERE condition WHERE (t.data->'people') @> '[{"name": "Bob"}]' is logically redundant, but it helps performance by eliminating irrelevant rows early: don't even unnest JSON arrays without a "Bob" in it.

For big tables, this is much more efficient with a matching index. If you run this kind of query regularly, you should have one:

CREATE INDEX mytable_people_gin_idx ON mytable
USING gin ((data->'people') jsonb_path_ops);

Related, with more explanation:

In Postgres 12 or later consider using SQL/JSON path expressions instead. See:

1
Renzo On

The correct query for your example is the following:

SELECT * 
FROM mytable
WHERE (data  #> '{people,0}' ->>'name') = 'Bob' 
AND (data  #> '{people,0}' ->>'age')::integer > 30

(note that the value of "people" is an array).

0
Dan Getz On

In your subquery:

SELECT (a->>'age')::float
FROM (SELECT jsonb_array_elements(data->'people') as a
      FROM mytable) as b
WHERE a @> json_object(ARRAY['name', 'Bob'])::jsonb

You selected all the rows of mytable all over again. That's why your subquery returns multiple values.

If you want to select rows from your table containing an element that satisfies certain conditions, then in your conditions, don't re-select from that table; use the row you've already selected in your outer query:

SELECT * FROM mytable
WHERE exists(SELECT 1
    FROM (SELECT jsonb_array_elements(data->'people') as person) as x
    WHERE person @> '{"name": "Bob"}'
    AND (person->>'age')::float > 30)

As far as I can tell, that weird double-subquery syntax is necessary. Note that data is from the outer query.

If you wanted to select all the JSON objects from the "people" fields that satisfy your conditions, then just aggregate all those "people" elements and filter them:

SELECT person
FROM (SELECT jsonb_array_elements(data->'people') as person
      FROM mytable) as x
WHERE person @> '{"name": "Bob"}'
AND (person->>'age')::float > 30