I'm running a query on a database that contains over 120 million rows and 200 columns. The dataset contains loan information where we have for each borrower about 15 rows (one observation per quarter) and the borrowerID is also the primary key. I'm also joining on this borrowerID to retrieve information from the previous quarter, I understand this makes my query quite complex. The database also contains a FK (Foreign Key) on Bank. The party that hosts the DB explained to me that if I write a function to run the query in parts, by bank, it would be much faster. Currently my query looks like this.
SELECT a.id.
a.arrears AS "Months in Arrears",
a.balance AS "Original Balance",
a.bank AS "Bank",
b.arrears AS "Next Months in Arrears",
b.balance AS "Next Original Balance",
dense_rank() OVER (PARTITION BY concat(a.bank, a.id), ("substring"(a.date::text, 1, 4) || 'Q'::text) || round(0.3333 + "substring"(a.date::text, 6, 2)::numeric / 3::numeric, 0) ORDER BY a.date DESC) AS "Duplicate",
FROM loandata a
LEFT JOIN loandata b ON a.bank::text = b.bank::text AND a.id::text = b.id::text AND round((b.date::date - a.date::date)::numeric / 30::numeric, 0) = 3::numeric
ALTER TABLE myview
OWNER TO me;
GRANT ALL ON TABLE myview TO me;
Could you help me to write a function that this query is executed by bank or maybe it makes sense to do it per id, as it is the primary key? I'm running this view to download and store the data in Tableau.
Many thanks in advance. Best, Tim
PS. The dense_rank function is there to find loans that have multiple observations per quarter.
The explain analyze results suggested in the comments can be found here: http://explain.depesz.com/s/K19 I used a subset of the data where I selected 1 bank.
I am not quite sure about the exact intention of the query, but using the date functions instead of computing in numeric types, and avoiding all the casts and concat()s will probably get you somewhere.
BTW: don't use columns called
date
. It is a type name in SQL (it is highlighted in green in my editor)