How do I do LIMIT within GROUP in the same table?

210 views Asked by At

I can't figure out how to do limit within group although I've read all similar questions here. Reading PSQL doc didn't help either :( Consider the following:

CREATE TABLE article_relationship
(
    article_from INT NOT NULL,
    article_to INT NOT NULL,
    score INT
);

I want to get a list of top 5 related articles per given article IDs sorted by score.

Here is what I tried:

select DISTINCT o.article_from
from article_relationship o
join lateral (
       select i.article_from, i.article_to, i.score from article_relationship i
       order by score desc
       limit 5
       ) p on p.article_from = o.article_from
where o.article_from IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.article_from;

And it returns nothing. I was under impression that outer query is like loop so I guess I only need source IDs there.

Also what if I want to join on articles table where there are columns id and title and get titles of related articles in resultset?

I added join in inner query:

select o.id, p.*
from articles o
join lateral (
       select a.title, i.article_from, i.article_to, i.score
       from article_relationship i
       INNER JOIN articles a on a.id = i.article_to
       where i.article_from = o.id
       order by score desc
       limit 5
       ) p on true
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;

But it made it very very slow.

2

There are 2 answers

2
Patrick On BEST ANSWER

The problem with no rows returning from your query is that your join condition is wrong: ON p.article_from = o.article_from; this should obviously be ON p.article_from = o.article_to.

That issue aside, your query will not return the top 5 scoring relations per article id; instead it will return the article IDs that reference one of the 5 top rated referenced articles throughout the table and (also) at least 1 of the 5 referenced articles for which you specify the id.

You can get the top 5 rated referenced articles per referencing article with a window function to rank the scores in a sub-select and then select only the top 5 in the main query. Specifying a list of referenced article IDs effectively means that you will rank how these referenced articles are scored for each referencing article:

SELECT article_from, article_to, score
FROM (
    SELECT article_from, article_to, score, 
           rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
    FROM article_relationship
    WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
WHERE rnk < 6
ORDER BY article_from, score DESC;

This is different from your code in that it returns up to 5 records for each article_from but it is consistent with your initial description.

Adding columns from table articles is trivially done in the main query:

SELECT a.article_from, a.article_to, a.score, articles.*
FROM (
    SELECT article_from, article_to, score, 
           rank() OVER (PARTITION BY article_from ORDER BY score DESC) AS rnk
    FROM article_relationship
    WHERE article_to IN (18329382, 61913904, 66538293, 66540477, 66496909) ) a
JOIN articles ON articles.id = a.article_to
WHERE a.rnk < 6
ORDER BY a.article_from, a.score DESC;
0
expert On

Version with join lateral

select o.id as from_id, p.article_to as to_id, a.title, a.journal_id, a.pub_date_p from articles o
  join lateral (
       select i.article_to from article_relationship i
       where i.article_from = o.id
       order by score desc
       limit 5
       ) p on true
  INNER JOIN articles a on a.id = p.article_to
where o.id IN (18329382, 61913904, 66538293, 66540477, 66496909)
order by o.id;