Return max by two columns within dataset?

8.2k views Asked by At

The problem I am facing is I am trying to query SAP HANA to bring back a list of unique codes that refer to one instance of a change being made to a database. For a bit of background to the below image, each change has a relevant Site ID and Product No. that I am using together as variables, in order to find out the TS Number for the most recent date.

However, when I use the SELECT MAX(DATAB) function, it forces me to use aGROUP BY clause. But, because I cannot omit the TS Number from the GROUP BY clause, it returns all three.

Is there a way to get the max date, for any given combination of Product No. and Site ID, and only return the TS Number for that date? In this example, it would be fine to use TOP 1 but this is just a scaled-down example from a query that will look at many combinations of Product No. and Site ID (with the desired outcome being a list of all of the TS Numbers that relate to the most recent change for that product/store combination, that I will use for a join to another query).

Any help would be appreciated. If full table design etc. is required so that people can attempt to replicate the problem I will happily provide this but am hoping there's a simple solution I have not thought of...

Many thanks

1

There are 1 answers

0
Christoph G On BEST ANSWER

As in any other SQL-DB that supports window functions, you can use row_number() or rank() function to get the desired result. Which one to use depends on how you want to handle tie values.

If you just want exactly one TS-Number in case there are more than one TS-Number for the same MAXDATE, use the following SQL:

select dat, ts_nr, pr_nr, site
  from
  (select *, row_number() over ( partition by pr_nr, site order by dat desc ) rownum 
      from mytab
  )
    where rownum = 1;

Be aware, that the result is non-deterministic. However, you can (should in most cases!) make it deterministic by adding ts_nr to the order by in the window order by clause. Then you get either the highest or lowest TS-Number for the same MAXDATE, depending on the sort order.

If you want all TS-Numbers in case there are several TS-Numbers for the same MAXDATE, use rank() instead of row_number(), like this:

select dat, ts_nr, pr_nr, site
  from
  (select *, rank() over ( partition by pr_nr, site order by dat desc ) ranknum
      from mytab
  )
    where ranknum = 1;