postgres generate array using slide window

282 views Asked by At

I'm trying to figure out how to a query to generate an ARRAY given a sliding window over a character column with Postgre.

For example, if I have this:

   pid           
   <chr>         
 1 WP_096038768.1
 2 WP_013465871.1
 3 WP_058155244.1
 4 WP_011329269.1
 5 WP_058374608.1
 6 WP_089368983.1
 7 WP_096739105.1
 8 WP_089346667.1
 9 WP_096041177.1
10 WP_010553306.1
...

I want a sliding window of size 1 before and after the row. The result is this:

   pid            g                                           
   <chr>          <chr>                                       
 1 WP_013465871.1 WP_096038768.1,WP_013465871.1,WP_058155244.1
 2 WP_058155244.1 WP_013465871.1,WP_058155244.1,WP_011329269.1
 3 WP_011329269.1 WP_058155244.1,WP_011329269.1,WP_058374608.1
 4 WP_058374608.1 WP_011329269.1,WP_058374608.1,WP_089368983.1
 5 WP_089368983.1 WP_058374608.1,WP_089368983.1,WP_096739105.1
 6 WP_096739105.1 WP_089368983.1,WP_096739105.1,WP_089346667.1
 7 WP_089346667.1 WP_096739105.1,WP_089346667.1,WP_096041177.1
 8 WP_096041177.1 WP_089346667.1,WP_096041177.1,WP_010553306.1
 9 WP_010553306.1 WP_096041177.1,WP_010553306.1,WP_007376542.1
10 WP_007376542.1 WP_010553306.1,WP_007376542.1,WP_039038284.1
...

Any hint is appreciated.

This example I did with R:

library(tidyverse)
library(dbplyr)
library(RPostgreSQL) 
library(DBI)

st2tm %>% 
  mutate(
    p1 = lag(pid),
    p2 = lead(pid)
  ) %>% 
  group_by(pid) %>% 
  mutate(g = paste(na.omit(c(p1,pid,p2)), sep = ",")) %>% 
  ungroup() %>% 
  select(-c(p1, p2)) %>% 
  filter(str_count(g,",")==2)

But when applied to a Postgres table through a DBI connection it fails with

Error in vapply(x, escape, character(1), con = con) : 
  values must be length 1,
 but FUN(X[[1]]) result is length 3

at paste and Error: str_count() is not available in this SQL variant at the filter.

Also, I think some smarter strategy.

1

There are 1 answers

2
Simon.S.A. On BEST ANSWER

This is most likely due to dbplyr not having translations defined for converting na.omit or str_count into postgresql (a translation for paste is most likely defined).

You can replace str_count and na.omit by checking earlier for missing values.

st2tm %>% 
  mutate(
    p1 = lag(pid),
    p2 = lead(pid)
  ) %>% 
  filter(!is.na(p1),
         !is.na(p2)) %>%
  mutate(g = paste(p1, ",", pid, ",", p2)) %>% 
  select(-c(p1, p2)) %>% 

And if paste is the issue you could replace it with postgresql's inbuilt CONCAT function.

st2tm %>% 
  mutate(
    p1 = lag(pid),
    p2 = lead(pid)
  ) %>% 
  filter(!is.na(p1),
         !is.na(p2)) %>%
  mutate(g = CONCAT(p1, ",", pid, ",", p2)) %>% 
  select(-c(p1, p2)) %>% 

Because CONCAT is not an R function, dbplyr will pass it as written to postgresql rather than attempting to translate it.