Self join inside table by multiple values. SQL, Postgresql

61 views Asked by At

Please help with self join inside one table by different fields. given table "Items":

Used PostgreSQL.

I need to get all the values from such table, for which factId = 'fact:170' and factId1 ='fact:171' and factId1.value ='false' and arrayIndexes are the same.

It means factId 170 & 171 have relation one-to one using the same arrayIndex.

So all the values of factId 'fact:170' are needed, for which there is related by same arrayIndex factId 'fact:171' which have value 'false'.

Here is column Item in the table, it means for each Item we have own set of values 170 and 171. Thank you.

I tryed this but I am not sure this works fine and fast:

SELECT l.value
FROM Items l
INNER JOIN Items p on p.item  = l.item
INNER JOIN Items c on c.item  = l.item
WHERE l.factid  = 'fact:170' AND c.factid = 'fact:170'  
AND p.factid  = 'fact:171' AND p.value='false' 
AND p.arrayindex = l.arrayindex

there are a lot of duplicates after work. Please check my query.

2

There are 2 answers

2
Albina On

Your query can be:

with t as (
  select distinct arrayindex
  from items
  where factid = 'urn:drug:171' and value = 'false'
)
select * from items
where arrayindex in (select * from t)
  and (factid = 'urn:drug:170' or factid = 'urn:drug:171') -- optional condition;

DB fiddle

Details:

  1. t CTE query finds unique arrayindex values for all rows which fulfill the condition
    where factid = 'urn:drug:171' and value = 'false';
  2. Outer query finds all the rows with factid = 'urn:drug:170'. Result rows are filtered by the arrayindex values which we received from the previous step.
1
Marmite Bomber On

Your apparent problem are the duplicates in the source table.

The 170 row appears 3 times in your date, the 171 row two times.

To get a one-to one relation you must must somehow introduce a unique identification - I use the row_number.

Than it is a plain join with additional predicate on the row number

Query (with sample data as CTE)

with tab as (
select * from (values 
('urn:fact:drug:170', '88cafe', 15),
('urn:fact:drug:170', '88cafe', 15),
('urn:fact:drug:170', '88cafe', 15),
('urn:fact:drug:171', 'false', 15),
('urn:fact:drug:171', 'false', 15)
)
t (factId,value,arrayIndexes)
),
-- from here your query --
t170 as (
select tab.*,
row_number() over (order by value) as rn
from tab
where factId = 'urn:fact:drug:170'
),
t171 as (
select tab.*,
row_number() over (order by value) as rn
from tab
where factId = 'urn:fact:drug:171' and value ='false'
)
select 
 t170.factid, t170.value, t170.arrayIndexes,
 t171.factid factid2c
from t170
join t171 on t170.arrayIndexes = t171.arrayIndexes and
t170.rn = t171.rn;

Result

factid           |value |arrayindexes|factid2c         |
-----------------+------+------------+-----------------+
urn:fact:drug:170|88cafe|          15|urn:fact:drug:171|
urn:fact:drug:170|88cafe|          15|urn:fact:drug:171|

So you can join two rows, the third is skipped as there is no match on both sides