PostgreSQL - How to select the first consecutive group having same value

1.2k views Asked by At

I have a table with pk and dept columns:

pk dept
-------
27  A
29  A
30  B
31  B
33  A

I need to select the first consecutive group, that is the first successive set of rows all having the same dept value when the table is ordered by pk, i.e. the expected result is:

pk dept
-------
27  A
29  A

In my example there are 3 consecutive groups (AA, BB and A). The size of a group is unlimited (can be more than 2).

4

There are 4 answers

0
Fabian Pijcke On BEST ANSWER

The following query should do what you want (I named your table tx):

SELECT *
FROM tx t1
WHERE NOT EXISTS (
  SELECT *
  FROM tx t2
  WHERE t2.dept <> t1.dept
    AND t2.pk < t1.pk);

The idea is to look for tuples such that no tuple with a lesser pk and a different department exists.

  • The first two A tuples are kept;
  • The B tuples are dropped because of the first two A tuples;
  • The last A tuple is dropped because of the B tuples.
6
cmoron On

Its a little bit complex and probably, the permformance poor, but you can achieve what you want with the code below. There are four operations:

  1. The first one is where we obtain the base order and base group ids for the next operation.
  2. In the sencond operation we make the trick computing an unique group id for each group
  3. In the third operation, where are spreading the unique group id over the rows of each group.
  4. Finally, we compute a consecutive group id for each group to allow the discretionary selection of groups, so we only have to filter by the group number we want to obtain.

Hope this helps.

SELECT fourthOperation.pk,
       fourthOperation.dept 
 FROM (SELECT thirdOperation.pk,
              thirdOperation.dept,
              DENSE_RANK() OVER (ORDER BY thirdOperation.spreadedIdGroup) denseIdGroup
         FROM (SELECT secondOperation.*, 
                      NVL(idGroup, LAG(secondOperation.idGroup IGNORE NULLS) OVER (ORDER BY secondOperation.numRow)) spreadedIdGroup
              FROM (SELECT firstOperation.*,
                           CASE WHEN LAG(firstOperation.rankRow) OVER (ORDER BY firstOperation.numRow) = firstOperation.rankRow
                                THEN NULL
                                ELSE firstOperation.numRow
                                 END idGroup
                       FROM (SELECT yourTable.*, 
                                    ROW_NUMBER() OVER (ORDER BY PK)   AS numRow, 
                                    DENSE_RANK() OVER (ORDER BY DEPT) AS rankRow
                               FROM ABORRAR yourTable) firstOperation) secondOperation ) thirdOperation) fourthOperation
 WHERE fourthOperation.denseIdGroup = 1                                   
0
wind39 On

I'm not sure if I understand your question, but for the first pk of each dept you can try this:

select min(pk) as pk,
       dept
from your_table
group by dept
0
Abelisto On

Remember about stored functions. Unlike to using window functions its allows to avoid the reading of the whole table:

--drop function if exists foo();
--drop table if exists t;
create table t(pk int, dep text);
insert into t values(27,'A'),(29,'A'),(30,'B'),(31,'B'),(33,'A');

create function foo() returns setof t language plpgsql as $$
declare
  r t;
  p t;
begin
  for r in (select * from t order by pk) loop
    if p is null then
      p := r;
    end if;
    exit when p.dep is distinct from r.dep;
    return next r;
  end loop;
  return;
end $$;

select * from foo();