RAILS SQL Subquery optimization

1k views Asked by At

I have a query which looks like this:

@inventory =  Pack.find_by_sql("SELECT Packs.id, "+
" (SELECT COUNT(*) FROM Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.status = 'online'      AND Stocks.user_id = #{current_user.id})) AS online,"+
" (SELECT COUNT(*) FROM Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.status = 'offline'     AND Stocks.user_id = #{current_user.id})) AS offline,"+
" (SELECT COUNT(*) FROM Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.status = 'depositing'  AND Stocks.user_id = #{current_user.id})) AS depositing,"+
" (SELECT COUNT(*) FROM Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.status = 'withdrawing' AND Stocks.user_id = #{current_user.id})) AS withdrawing,"+
" (SELECT COUNT(*) FROM Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.status = 'selling'     AND Stocks.user_id = #{current_user.id})) AS selling,"+
" (SELECT COUNT(*) FROM Transactions WHERE (Transactions.pack_id = Packs.id AND Transactions.status = 'buying' AND Transactions.buyer_id = #{current_user.id})) AS buying"+
" FROM Packs WHERE disabled = false")

I am thinking there's a way to make a new sub-query so that instead of

SELECT FROM Stocks

the query selects from a stored table

SELECT FROM (Stocks WHERE (Stocks.pack_id = Packs.id AND Stocks.user_id = #{current_user.id}))

which would only be queried once. Then the WHERE Stocks.status = ? stuff would be applied to that stored table.

Any help guys?

2

There are 2 answers

5
Erwin Brandstetter On

The best query depends on data distribution and other details.

This is very efficient as long as most pack_id from the subqueries are actually used in the join to packs (most packs are NOT disabled):

SELECT p.id
     , s.online, s.offline, s.depositing, s.withdrawing, s.selling, t.buying
FROM   packs p
LEFT   JOIN (
   SELECT pack_id 
        , count(status = 'online'      OR NULL) AS online
        , count(status = 'offline'     OR NULL) AS offline
        , count(status = 'depositing'  OR NULL) AS depositing
        , count(status = 'withdrawing' OR NULL) AS withdrawing
        , count(status = 'selling'     OR NULL) AS selling
   FROM   stocks
   WHERE  user_id = #{current_user.id}
   AND    status = ANY('{online,offline,depositing,withdrawing,selling}'::text[])
   GROUP  BY 1
   ) s ON s.pack_id = p.id 
LEFT   JOIN (
   SELECT pack_id, count(*) AS buying
   FROM   transactions
   WHERE  status = 'buying'
   AND    buyer_id = #{current_user.id}
   ) t ON  t.pack_id = p.id
WHERE  NOT p.disabled;

In pg 9.4 you can use the aggregate FILTER clause:

SELECT pack_id 
     , count(*) FILTER (WHERE status = 'online')      AS online
     , count(*) FILTER (WHERE status = 'offline')     AS offline
     , count(*) FILTER (WHERE status = 'depositing')  AS depositing
     , count(*) FILTER (WHERE status = 'withdrawing') AS withdrawing
     , count(*) FILTER (WHERE status = 'selling')     AS selling
FROM   stocks
WHERE  ... 

Details:

Use crosstab() for the pivot table to make that faster, yet:

SELECT p.id
     , s.online, s.offline, s.depositing, s.withdrawing, s.selling, t.buying
FROM   packs p
LEFT   JOIN  crosstab(
   $$
   SELECT pack_id, status, count(*)::int AS ct
   FROM   stocks
   WHERE  user_id = $$ || #{current_user.id} || $$
   AND    status = ANY('{online,offline,depositing,withdrawing,selling}'::text[])
   GROUP  BY 1, 2
   ORDER  BY 1, 2
   $$
  ,$$SELECT unnest('{online,offline,depositing,withdrawing,selling}'::text[])$$
    ) s (pack_id int
      , online int
      , offline int
      , depositing int
      , withdrawing int
      , selling int
       ) USING (pack_id)
LEFT   JOIN (
   SELECT pack_id, count(*) AS buying
   FROM   transactions
   WHERE  status = 'buying'
   AND    buyer_id = #{current_user.id}
   ) t ON  t.pack_id = p.id
WHERE  NOT p.disabled;

Details here:

If most packs are disabled, LATERAL joins will be faster (requires pg 9.3 or later):

SELECT p.id
     , s.online, s.offline, s.depositing, s.withdrawing, s.selling, t.buying
FROM   packs p
LEFT   JOIN LATERAL (
   SELECT pack_id 
        , count(status = 'online'      OR NULL) AS online
        , count(status = 'offline'     OR NULL) AS offline
        , count(status = 'depositing'  OR NULL) AS depositing
        , count(status = 'withdrawing' OR NULL) AS withdrawing
        , count(status = 'selling'     OR NULL) AS selling
   FROM   stocks
   WHERE  user_id = #{current_user.id}
   AND    status = ANY('{online,offline,depositing,withdrawing,selling}'::text[])
   AND   pack_id = p.id
   GROUP  BY 1
   ) s ON TRUE
LEFT   JOIN LATERAL (
   SELECT pack_id, count(*) AS buying
   FROM   transactions
   WHERE  status = 'buying'
   AND    buyer_id = #{current_user.id}
   AND   pack_id = p.id
   ) t ON TRUE
WHERE  NOT p.disabled;

Why LATERAL? And are there alternatives in pg 9.1?

0
khampson On

If what you're after is a count of the various types, something like the following would be much less code and easier to read/maintain, IMO...

You could split them up into the different tables, so, for stocks, something like this:

@inventory = Pack.find_by_sql("SELECT status, count(*)
                               FROM stocks
                               WHERE user_id = ?
                               GROUP BY status
                               ORDER BY status", current_user.id)

Note the importance of using ? to prevent SQL injection. Also, Ruby supports multiline strings, so there's no need to quote and concatenate every line.

You can do something similar for the other tables.