Percentile over in a Window function in Amazon Athena

58 views Asked by At

I am searching for a solution for a problem. I am working with Amazon Athena and i need a field that shows the percentile of days for a product between 2 timestamps. I have it working in Amazon Quicksight but because of the Limit to aggregate there i can“t use the results in other calculated fields. So my only option is to build the field in as a SQL statement directly into my dataset. But when i try to build it it always gives me errors.

I tried following statements:

approx_percentile(case when (date_diff('millisecond', b.zp8_ts,b.depein_ts) / (1E3 *3600 *24)<0)OR (date_diff('millisecond', b.zp8_ts,b.depein_ts) / (1E3 *3600 *24) >200) then null else date_diff('millisecond', b.zp8_ts,b.depein_ts) / (1E3 *3600 *24)end,20) OVER (werkziel) 

case when (date_diff('millisecond', b.vhfein_ts,b.vhfaus_ts) / (1E3 *3600 *24)<0)OR (date_diff('millisecond', b.vhfein_ts,b.vhfaus_ts) / (1E3 *3600 *24) >200) then null else Percentile_count(date_diff('millisecond', b.vhfein_ts,b.vhfaus_ts) / (1E3 *3600 *24)) OVER (PARTITION BY concat(c.pk_nr_werk_num2,c.zielort_id),20) end 

And this is the calculated field from Amazon Quicksight which is working: percentileCont({zp8_depein_90},20,[werkziel])

Sample Data:

zp8_ts depein_ts pk_num_werk_num2 zielort_id
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 05 ABC44 L
01.01.2022 05.01.2022 05 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 11 ABC44 L
01.01.2022 05.01.2022 11 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
01.01.2022 05.01.2022 15 ABC44 L
01.01.2022 05.01.2022 15 ABC08 L
1

There are 1 answers

0
Oliver On

I have found a solution for this problem. There was a particion by missing after the over.

approx_percentile(
CASE WHEN ( 
  date_diff('millisecond', b.vhfaus_ts,b.depein_ts) / (1E3 *3600 *24) < 0
  ) OR (
    date_diff('millisecond', b.vhfaus_ts,b.depein_ts) / (1E3 *3600 *24) >200
  ) THEN NULL 
  ELSE date_diff('millisecond', b.vhfaus_ts,b.depein_ts) / (1E3 *3600 *24) end, 0.8) OVER (**PARTITION by** concat(c.pk_nr_werk_num2,c.zielort_id)
) AS vhfaus_depein_p80,