pl/python receiving and outputting a set of rows

810 views Asked by At

Problem description: I'm trying to define a pl/python aggregator that receives a set of rows and returns a set of rows, for every set of rows generated from a query. The input set of rows should be a group generated with the group by statement in a query. I'm trying to find my way through the syntax, but I'm having a hard time with that. pl/pgsql functions are welcome, but I'm afraid they may lack the expressive power to do this.

Context description: I have a table with three columns: ticker symbol, date and value. I'm trying to calculate the macd for each ticker symbol and date according to the values I have. I can group and sort the table by date, then all I would have to do is for each group calculate the value of the aggregator for that date according to the current value for each ticker and the previous value for the aggregator, which I guess I will store in SD, then all I have to return is a set of rows with that value, and possibly the ticker to make a join, but I think it should be possible to preserve the order and save the time of a join.

This is how the table would be created:

create table candles(ticker text, day date, value real,
  primary key (ticker, day));

Here you can find an example in Java (pseudocode-ish) that shows how the groups (group by day) should be processed after sorted by ticker. In Java these groups are represented as an ArrayList, since they are sorted by ticker, the same position corresponds to the same ticker (database is checked on generation so no records are missing) and since they are grouped by day every group has the values for the same day and the next day than the previous call.

PD: I've also added the code to calculate MACD in pl/pgsql but for this I order by ticker and day, I'd like to order by day and do all the calculations for all the tickers in a loop, because that should be more efficient, specially considering the final output has to be ordered by day to be consumed in this way.

If there is anything that is not completely clear please ask me, I'm trying to be clear and concise.

Thank you very much.

2

There are 2 answers

0
Trylks On BEST ANSWER

Sorry if the description was confusing. I solved the problem I was facing by:

  1. creating arrays from the values I wanted to group by using a subquery. select array_agg(x) from y group by z order by v;
  2. creating a function that accepts the arrays, just by adding [] to the parameters description. And an aggregator for that function.
  3. using the aggregator and the subquery in the final query.

It was fairly easy, but I didn't know about the array type and all its possibilities. Sorry and thank you.

1
Twelfth On

I'll take a shot in the dark now that I've looked at what you're trying to do.

Select symbol, value
from a_table_with_three_columns
where date = '2012-01-05'

Just some 'stack overflow' tips for you...giving us column names as 'value' or 'date' instead of your actual column names or neglecting table names results in us only being able to give silly pseudo code like above that won't work directly in your DB. In the future, give us your table name and columns...preferably the table create statement.

So above sql will give you all records and their value on that day. (I'm assuming your data is a single day and not hourly time slots). Now we can take that query and join it to itself except for a previous day:

select a.symbol, a.value - b.value
(Select symbol, value
from a_table_with_three_columns
where date = '2012-01-05') a
inner join
(Select symbol, value
from a_table_with_three_columns
where date = '2012-01-05')b
on a.symbol = b.symbol

This will give you a list of the symbol and the difference between values over two days. With a little modification, you can do this join on a datediff function and include the date as part of your data set. This would return symbol, day, and the difference of values from day previous. I can step you through that part as well, though I'd appreciate better schema data from yourself so I don't need to use 'a_table_with_three_columns' ;)