I have the following table structure:
CREATE TABLE mytable (
id serial PRIMARY KEY,
data jsonb
);
And the following data (partial for brevity...notice the randomness of years and the sales/expense yrs don't align with each other):
INSERT INTO mytable (data)
VALUES
('{"employee": "Jim Romo",
"sales": [{"value": 10, "yr": "2012"}, {"value": 5, "yr": "2013"}, {"value": 40, "yr": "2014"}],
"expenses": [{"value": 2, "yr": "2007"}, {"value": 1, "yr": "2013"}, {"value": 3, "yr": "2014"}],
"product": "tv", "customer": "1", "updated": "20150501"
}'),
('{"employee": "Jim Romo",
"sales": [{"value": 10, "yr": "2012"}, {"value": 5, "yr": "2013"}, {"value": 41, "yr": "2014"}],
"expenses": [{"value": 2, "yr": "2009"}, {"value": 3, "yr": "2013"}, {"value": 3, "yr": "2014"}],
"product": "tv", "customer": "2", "updated": "20150312"
}'),
('{"employee": "Jim Romo",
"sales": [{"value": 20, "yr": "2012"}, {"value": 25, "yr": "2013"}, {"value": 33, "yr": "2014"}],
"expenses": [{"value": 8, "yr": "2012"}, {"value": 12, "yr": "2014"}, {"value": 5, "yr": "2009"}],
"product": "radio", "customer": "2", "updated": "20150311"
}'),
('{"employee": "Bill Baker",
"sales": [{"value": 1, "yr": "2010"}, {"value": 2, "yr": "2009"}, {"value": 3, "yr": "2014"}],
"expenses": [{"value": 3, "yr": "2011"}, {"value": 1, "yr": "2012"}, {"value": 7, "yr": "2013"}],
"product": "tv", "customer": "1", "updated": "20150205"
}'),
('{"employee": "Bill Baker",
"sales": [{"value": 10, "yr": "2010"}, {"value": 12, "yr": "2011"}, {"value": 3, "yr": "2014"}],
"expenses": [{"value": 4, "yr": "2011"}, {"value": 7, "yr": "2009"}, {"value": 4, "yr": "2013"}],
"product": "radio", "customer": "1", "updated": "20150204"
}'),
('{"employee": "Jim Romo",
"sales": [{"value": 22, "yr": "2009"}, {"value": 17, "yr": "2013"}, {"value": 35, "yr": "2014"}],
"expenses": [{"value": 14, "yr": "2011"}, {"value": 13, "yr": "2014"}, {"value": 8, "yr": "2013"}],
"product": "tv", "customer": "3", "updated": "20150118"
}')
For each employee I need to evaluate the most-recently updated row and find employees with 2014 tv sales greater than 30. From there I need to further filter for employees with average tv expenses of less than 5. For the average I just need to take ALL of their tv expenses and not just the latest row.
My expected output would be 1 row:
employee | customer | 2014 tv sales | 2013 avg tv expenses
------------+----------+-----------------+----------------------
Jim Romo | 1 | 40 | 4
I can (kindof) do 1 or the other but not both:
a. Get 2014 sales > 30 (but unable to get their most recent "tv" sales ;(
SELECT * FROM mytable WHERE (SELECT (a->>'value')::float FROM
(SELECT jsonb_array_elements(data->'sales') as a) as b
WHERE a @> json_object(ARRAY['yr', '2014'])::jsonb) > 30
b. Get the avg 2013 expense (this needs to be avg tv expenses)
SELECT avg((a->>'value')::numeric) FROM
(SELECT jsonb_array_elements(data->'expenses') as a FROM mytable) as b
WHERE a @> json_object(ARRAY['yr', '2013'])::jsonb
EDIT: This will potentially be a very large table so any commentary on performance and indexing needs would be appreciated as I am new to both postgresql and jsonb.
EDIT #2: I've tried both answers and neither seem to be efficient for a large table ;(
This is a (rather lengthy) answer to your problem. The comments inside the query should explain the different pieces. The basic ideas that I followed are: 1) keep simple each operation, try first to produce the correct result, then optimize; 2) trasform as much as possible (but not to much) the json structure in a more "relational like" structure, since relations have more powerful operators that json data in postgres. Of corse there is space to simplify the query and even produce a more efficient version, but at least this is a starting point.