Create a database view joining multiple tables

129 views Asked by At

I would like to get some help in creating a database view. My DB schema looks as following:

products            (id, ignored_comments_ids (array))
activities          (id)
comments            (id)
activities_comments (activity_id comment_id)
products_comments   (product_id, comment_id)
offers              (product_id, activity_id)

Now I need to create a view of all products' comments with custom column named source:

  • source = 'OFFER': comments coming from products.offers.activities.comments association
  • source = 'DIRECT': comments coming from products.comments association

    Also, the view should exclude comments from products.ignored_comments_ids

How do I do that? The view has to have product_id, source and all columns from comments table.

I came up with the following view, how can I improve it?

CREATE OR REPLACE VIEW all_comments AS
  WITH the_comments AS (
    SELECT
      comments.*,
      'OFFER'     AS source,
      products.id AS product_id
    FROM comments
    JOIN activities_comments ON activities_comments.comment_id = comments.id
    JOIN activities          ON activities.id = activities_comments.activity_id
    JOIN offers              ON offers.activity_id = activities.id
    JOIN products            ON products.id = offers.product_id
  UNION
    SELECT
      comments.*,
      'DIRECT'    AS source,
      products.id AS product_id
    FROM comments
    JOIN products_comments ON products_comments.comment_id = comments.id
    JOIN products          ON products.id = products_comments.product_id
  )
  SELECT DISTINCT ON (the_comments.id)
    the_comments.id,
    the_comments.name,
    the_comments.source,
    the_comments.product_id
  FROM the_comments
  JOIN products ON products.id = the_comments.product_id
  WHERE NOT to_json(products.ignored_comment_ids)::jsonb @> the_comments.id::jsonb
  ORDER BY the_comments.id;
1

There are 1 answers

1
Adrian Maxwell On

UNION may be used to combine 2 sets of data, AND, at the same time it will remove duplicate rows. UNION ALL may be used to combine 2 sets of data (and then stops). So UNION ALL avoids the overhead of searching for, and removing, duplicated rows, so it is faster.

Within your initial common table expression (cte) the_comments you force each side of a union to use different constants, e.g.

select *
from (
    select 1 as id, 'OFFER' AS source
    union
    select 1 as id, 'DIRECT' AS source
    ) d
;
result:
  id   source  
 ---- -------- 
   1   DIRECT  
   1   OFFER 

Even though id 1 is in both sides of that union, because of the different constants 2 rows are returned by that example query. So please use UNION ALL instead.

Despite the convenience of select * it should not be used in views (although there are arguments both ways, e.g. here). Perhaps this was done to simplify the question, but I hope it isn't used literally as seen. If the purpose of the view is to only return 4 columns then specify just those columns.

Although you need product_id in the output, this can be sourced from offers.product_id or products_comments.product_id so you don't actually need to join to the products table. There is no need for join to the products table after the cte either.

Because we are now using UNION ALL I cannot see any benefit is using SELECT DISTINCT ON(...), I suspect this is just an overhead that can be removed. Obviously I cannot verify this and it may depend purely on your functional requirements. Also note that SELECT DISTINCT ON(...) will remove a source which you have so carefully introduced e.g.

select distinct on (id) id, source
from (
    select 1 as id, 'OFFER' AS source
    union
    select 1 as id, 'DIRECT'AS source
    ) d
;
result:
  id   source  
 ---- -------- 
   1   DIRECT   

Do not include an order by clause in any view, only order a "final query". In other words; if you create a view you are then likely to use it in several other queries. Each of those queries will likely have their own where clause and need differing result orders. If you order the view you are just consuming cpu cycles and then dispensing with that effort later. So, please remove the order by clause.

I would dearly like to suggest a different approach to the final where clause, but as I don't deal with JSON very much I don't have sufficient experience to propose an alternative. However using functions on data in a where clause is almost always a cause of poor performance, most notably because it usually removes access to indexes on the columns involved in those functions. Finding a more efficient way to exclude comment exceptions is likely to make the largest improvement to your queries' performance.

So, my suggestions would lead to this:

WITH the_comments
AS (
    SELECT
        comments.id
      , comments.name
      , 'OFFER' AS source
      , offers.product_id AS product_id
    FROM comments
    JOIN activities_comments ON activities_comments.comment_id = comments.id
    JOIN activities ON activities.id = activities_comments.activity_id
    JOIN offers ON offers.activity_id = activities.id
    UNION ALL
    SELECT
        comments.id
      , comments.name
      , 'DIRECT' AS source
      , products_comments.product_id AS product_id
    FROM comments
    JOIN products_comments ON products_comments.comment_id = comments.id
    )
SELECT
    the_comments.id
  , the_comments.name
  , the_comments.source
  , the_comments.product_id
FROM the_comments
/* perhaps raise a separate question on this bit */
WHERE NOT to_json(products.ignored_comment_ids)::jsonb @> the_comments.id::jsonb