I would like to get some help in creating a database view. My DB schema looks as following:
products (id, ignored_comments_ids (array))
activities (id)
comments (id)
activities_comments (activity_id comment_id)
products_comments (product_id, comment_id)
offers (product_id, activity_id)
Now I need to create a view of all products' comments with custom column named source
:
source
= 'OFFER': comments coming fromproducts.offers.activities.comments
associationsource
= 'DIRECT': comments coming fromproducts.comments
associationAlso, the view should exclude comments from
products.ignored_comments_ids
How do I do that? The view has to have product_id
, source
and all columns from comments
table.
I came up with the following view, how can I improve it?
CREATE OR REPLACE VIEW all_comments AS
WITH the_comments AS (
SELECT
comments.*,
'OFFER' AS source,
products.id AS product_id
FROM comments
JOIN activities_comments ON activities_comments.comment_id = comments.id
JOIN activities ON activities.id = activities_comments.activity_id
JOIN offers ON offers.activity_id = activities.id
JOIN products ON products.id = offers.product_id
UNION
SELECT
comments.*,
'DIRECT' AS source,
products.id AS product_id
FROM comments
JOIN products_comments ON products_comments.comment_id = comments.id
JOIN products ON products.id = products_comments.product_id
)
SELECT DISTINCT ON (the_comments.id)
the_comments.id,
the_comments.name,
the_comments.source,
the_comments.product_id
FROM the_comments
JOIN products ON products.id = the_comments.product_id
WHERE NOT to_json(products.ignored_comment_ids)::jsonb @> the_comments.id::jsonb
ORDER BY the_comments.id;
UNION
may be used to combine 2 sets of data, AND, at the same time it will remove duplicate rows.UNION ALL
may be used to combine 2 sets of data (and then stops). SoUNION ALL
avoids the overhead of searching for, and removing, duplicated rows, so it is faster.Within your initial common table expression (cte)
the_comments
you force each side of a union to use different constants, e.g.Even though id 1 is in both sides of that union, because of the different constants 2 rows are returned by that example query. So please use
UNION ALL
instead.Despite the convenience of
select *
it should not be used in views (although there are arguments both ways, e.g. here). Perhaps this was done to simplify the question, but I hope it isn't used literally as seen. If the purpose of the view is to only return 4 columns then specify just those columns.Although you need product_id in the output, this can be sourced from
offers.product_id
orproducts_comments.product_id
so you don't actually need to join to the products table. There is no need for join to the products table after the cte either.Because we are now using
UNION ALL
I cannot see any benefit is usingSELECT DISTINCT ON(...)
, I suspect this is just an overhead that can be removed. Obviously I cannot verify this and it may depend purely on your functional requirements. Also note thatSELECT DISTINCT ON(...)
will remove asource
which you have so carefully introduced e.g.Do not include an
order by
clause in any view, only order a "final query". In other words; if you create a view you are then likely to use it in several other queries. Each of those queries will likely have their own where clause and need differing result orders. If you order the view you are just consuming cpu cycles and then dispensing with that effort later. So, please remove the order by clause.I would dearly like to suggest a different approach to the final where clause, but as I don't deal with JSON very much I don't have sufficient experience to propose an alternative. However using functions on data in a where clause is almost always a cause of poor performance, most notably because it usually removes access to indexes on the columns involved in those functions. Finding a more efficient way to exclude comment exceptions is likely to make the largest improvement to your queries' performance.
So, my suggestions would lead to this: