WHERE clause using to not select rows with timestamps 50ms either side of it?

141 views Asked by At

I have part of a table like this:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:28.775248 | Poll
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

I want to achieve this:

 timestamp                  | Source
----------------------------+----------
 2017-07-28 14:20:28.757464 | Stream
 2017-07-28 14:20:29.777678 | Poll
 2017-07-28 14:21:28.582532 | Stream

Where the 2nd row in the original table had been removed, because it's within 50ms of a timestamp before or after it. Important is only removes rows when Source = 'Poll'.

Not sure how this can be achieved with a WHERE clause maybe?

Thanks in advance for any help.

1

There are 1 answers

0
Goran Rakic On

Whatever we do, we can limit that to Pools, then union those rows with Streams.

with 
streams as (
 select *
 from test 
 where Source = 'Stream'  
),
pools as (
  ...
)

(select * from pools) union (select * from streams) order by timestamp

To get pools, there are different options:

Correlated subquery

For each row we run extra query to get the previous row with the same source, then select only those rows where there is no previous timestamp (first row) or where previous timestamp is more than 50ms older.

with 
...
pools_with_prev as (
  -- use correlated subquery
  select 
    timestamp, Source, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    (select max(t2.timestamp)from test as t2 
      where t2.timestamp < test.timestamp and
     t2.Source = test.Source) 
      as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)

...

https://www.db-fiddle.com/f/iVgSkvTVpqjNZ5F5RZVSd2/2

Join two sliding tables

Instead running subquery for each row, we can just create a copy of our table and slide it so each Pool row joins with the previous row of the same source type.

with 
...
pools_rn as (
 -- add extra row number column
 -- rows: 1, 2, 3
 select *,
  row_number() over (order by timestamp) as rn
 from test
 where Source = 'Pool'  
),
pools_rn_prev as (
 -- add extra row number column increased by one
 -- like sliding a copy of the table one row down
 -- rows: 2, 3, 4
 select timestamp as timestamp_prev,
  row_number() over (order by timestamp)+1 as rn
 from test
 where Source = 'Pool'  
),
pools as (
 -- now join prev two tables on this column
 -- each row will join with its predecessor
 select timestamp, source 
 from pools_rn
  left outer join pools_rn_prev
  on pools_rn.rn = pools_rn_prev.rn
 where 
  -- then select rows which are >50ms apart
  timestamp_prev is null or
  timestamp - interval '00:00:00.05' > timestamp_prev
)

...

https://www.db-fiddle.com/f/gXmSxbqkrxpvksE8Q4ogEU/2

Sliding window

Modern SQL can do something similar, with partitioning by source, then using sliding window to join with the previous row.

with 
...
pools_with_prev as (
  -- use sliding window to join prev timestamp
  select *, 
    timestamp - interval '00:00:00.05' 
      as timestamp_prev_limit,
    lag(timestamp) over(
      partition by Source order by timestamp
    ) as timestamp_prev
  from test
),
pools as (
  select timestamp, Source
  from pools_with_prev
  -- then select rows which are >50ms apart
  where timestamp_prev is NULL or
  timestamp_prev < timestamp_prev_limit
)


...

https://www.db-fiddle.com/f/8KfTyqRBU62SFSoiZfpu6Q/1

I believe this is the most optimal.