How to fix 'must be an aggregate expression or appear in GROUP BY clause' with 'as' clause in Presto?

909 views Asked by At

This question is similar to another one, but I'm providing a simpler example. The other query was too advanced to make sense to me.

Sample (fake) data

id gender kg
4f5a07ca-02e0-8981-3c30-4d9924a169a3 male 103
4f5a07ca-02e0-8981-3c30-4d9924a169a3 male 85
4f5a07ca-02e0-8981-3c30-4d9924a169a3 male 469
e05d54e9-8292-b26c-5618-8a3712b4fc44 female 33

Desired outcome

id gender kg
4f5a07ca-02e0-8981-3c30-4d9924a169a3 male 85
e05d54e9-8292-b26c-5618-8a3712b4fc44 female 33
e05d54e9-8292-b26c-5618-8a3712b4fc44 female 36
01f8bbfd-cfc6-3b97-8bc1-8da6f0b4a9a8 female 92

(Goal is having the same id only show up once, and just picking the first match, given an ordering by kg)

QUERY:

SELECT 
    p.id,
    p.gender,
    p.kg
FROM patient p
ORDER BY p.kg
GROUP BY 1

Error:

'p.gender' must be an aggregate expression or appear in GROUP BY clause

And if I change it to GROUP BY 1, 2, I get the same error, one column over:

'p.kg' must be an aggregate expression or appear in GROUP BY clause

How can I solve this?

3

There are 3 answers

0
Guru Stron On BEST ANSWER

"Ordering by kg" combined with selecting only one value sounds awfully like min/max aggregate function. In this case min (for gender you can use min_by or arbitrary):

SELECT 
    p.id,
    min_by(p.gender, p.kg),
    min(p.kg)
FROM patient p
GROUP BY p.id
1
Gerballi On

You can use an aggregate function to get min/max/avg kg. Otherwise you need to replace p.kg with a subquery returning a single version, something like:

SELECT 
    p.id,
    p.gender,
    (select top 1 p.kg from patient where patient.id=p.id)
FROM patient p
ORDER BY p.id
2
Haleemur Ali On

what you need here is a window function first_value combined with distinct, rather than group-by / aggregate.

the window function + distinct pattern can sometimes be used to replicate aggregate query

select distinct 
  id
, first_value(gender) over (partition by id order by kg) gender
, first_value(kg) over (partition by id order by kg) kg
from patient