Procedure for metaquery in pure SQL / PostgreSQL

170 views Asked by At

Given the following:

CREATE TABLE filters (
id BIGINT NOT NULL
, name TEXT NOT NULL
, range NUMRANGE NOT NULL
, UNIQUE (id, name)
);

INSERT INTO filters (id,name,range) VALUES (99,'value1','[45,90]'::NUMRANGE);
INSERT INTO filters (id,name,range) VALUES (99,'value2','[15,50]'::NUMRANGE);
INSERT INTO filters (id,name,range) VALUES (23,'value1','[45,90]'::NUMRANGE);

CREATE TABLE data (
value1 NUMERIC
, value2 NUMERIC
);

EDIT - Added sample data rows

// filter.id = 99, neither value1 or value2 match this row
INSERT INTO data (value1,value2) VALUES (40.01, 11.12); 

// filter.id = 99, only value2 matches this row
INSERT INTO data (value1,value2) VALUES (15.48, 20.14);

// filter.id = 99, only value1 matches this row
INSERT INTO data (value1,value2) VALUES (53.48, 70.14);

// filter.id = 99, both value1 and value2 match this row
INSERT INTO data (value1,value2) VALUES (64.12, 33.48);

The filters table consists of a series of filters, grouped by id.

I want to apply a specific group of filters (i.e. filters.id = 99) to the data table where each row of filters would apply filters.range to the column of data that matched filters.name.

Originally, the schema included a 1:1 column duplication in filters and data, such that a join could be accomplished with filters.col3 = data.col3 or by a ranged comparison. The problem is I know have over 1000 filters I'd like to track in the filters table and apply against the data table.

Tried some googling but not sure how to phrase this type of question. I know I could just form the queries in an external language, but I'd like to execute this in pure SQL/PostgreSQL if possible.

1

There are 1 answers

0
Erwin Brandstetter On BEST ANSWER

You need dynamic SQL for that.

Add a surrogate PK to your table for several reasons, one of which being a stable sort order for filters within the same group. Another: always add a PK.

CREATE TABLE filter (
   filter_id serial PRIMARY KEY  -- add surrogate PK
 , filter_grp int NOT NULL
 , name   text NOT NULL
 , range  numrange NOT NULL
);

This function does it all:

CREATE OR REPLACE FUNCTION f_get_data(_filter_grp int)
  RETURNS SETOF data AS
$func$
DECLARE
   _where text;
   _arr   numrange[];
BEGIN

SELECT string_agg(name || ' <@ $1[' || rn || ']', ' AND ') -- AND?
      ,array_agg(range)
INTO   _where, _arr
FROM  (
   SELECT name, range
         ,row_number() OVER (ORDER BY filter_id) AS rn
   FROM   filter
   WHERE  filter_grp = _filter_grp
   ORDER  BY filter_id
   ) sub;

-- RAISE NOTICE '%', 'SELECT * FROM data WHERE ' || _where;
RETURN QUERY EXECUTE 'SELECT * FROM data WHERE ' || _where
USING  _arr;

END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_get_data(99);

This builds and executes a query of the form:

SELECT * FROM data
WHERE  value1 <@'[45,90]'::numrange
AND    value2 <@'[15,50]'::numrange;

SQL Fiddle.

Explain

  • SELECT * FROM data can conveniently be coupled with SETOF data as return type of the function.

  • Avoid converting values to text and back when building the query string. Instead provide values with the USING clause, and use $n placeholders in the query sting for EXECUTE.
    In this particular case, the number of values is also dynamic. We can dodge the bullet by using array indices which automatically correspond to the position of the value in the aggregated array.