PostgreSQL and joining using character varying[] fields

777 views Asked by At

I've inherited a PostgreSQL 9.2.4 database and while I have a fairly extensive background in SQL Server I'm having a little trouble wrapping my head around a problem I'm encountering.

I have one table that has three fields (among other things) in it. "age_years", "age_months", and "age_days". If someone in the table is 2 months old or younger then they have a value in the "age_days" field for the number of days old they are. If they are less than 3 years old but older than 2 months then they have a value in the "age_months" field. Anything older than 3 and they have a value in the "age_years" field.

A given record only has a non-zero value in one of those three fields. There will never be a situation where, for instance, age_days and age_years both have a non-zero value. These records represent hospital visits and the ages are the age of the individual at the time of the visit.

In another table I have several character varying[] fields with up to 20 values. They are ref_age_cd, ref_age, ref_clow, and ref_chigh. Here is an example record from that table (with fewer values than the max just for display purposes):

My apologies for the ugly lines below. I can't seem to get them to format in a very readable condition.

ref_age_cd | ref_age | ref_clow | ref_chigh

[D,D,D,M,M,Y,Y,Y]   [1,4,15,2,7,13,18,199]  [9.1,9.8,5.4,5.5,7.9,5.1,4.8,4.8]   [27.1,27.8,16.4,15.8,15.9,11.1,10.8,10.8]

The ref_age_cd field determines what kind of age you're looking at (days, months, or years). ref_age determines the value, and then based on those two you get the low and high values from the ref_clow and ref_chigh fields. So for example, if someone has a 13 in the age_months field then you would look at ref_age_cd and find the 'M' values in the array and then look at the corresponding ref_age field and find the largest value that is lower than the value in the age_months field. So the array index would be 5. Then you grab the fifth value in the ref_clow and ref_chigh fields for the low and high values. (7.9 and 15.9 respectively)

If someone was 10 days old the array index to look at would be 2 (ref_age_cd of 'D' and ref_age of 4). This would indicate a low and high value of 9.8 and 27.8. If they were 80 years old the index would be 7 (ref_age_cd of 'Y' and ref_age of 18). Low and high values of 4.8 and 10.8.

I just can't figure out how to program this so when I join from table A (with the age_days, age_months, or age_years fields) to the reference table I can pull the right array index for ref_clow and ref_chigh.

I should also mention that I have no ability to make any changes to this database. I need to make this work with what I've been given.

2

There are 2 answers

0
UkrainianSpider On

This ended up doing the trick. Posted so others might be able to use it.

--test data in first two "with" statements
with a AS (
  select 1 AS patient_nr, CAST(2 AS INT) AS age_days, CAST(NULL AS INT) AS age_months, CAST(NULL AS INT) AS age_years
  UNION ALL
  select  2 AS patient_nr, CAST(16 AS INT) AS age_days, CAST(NULL AS INT) AS age_months, CAST(NULL AS INT) AS age_years
  UNION ALL
  select  3 AS patient_nr, CAST(NULL AS INT) AS age_days, CAST(13 AS INT) AS age_months, CAST(NULL AS INT) AS age_years
  UNION ALL
  select  4 AS patient_nr, CAST(10 AS INT) AS age_days, CAST(NULL AS INT) AS age_months, CAST(NULL AS INT) AS age_years
  UNION ALL
  select  5 AS patient_nr, CAST(NULL AS INT) AS age_days, CAST(NULL AS INT) AS age_months, CAST(80 AS INT) AS age_years
), b as (
  SELECT ARRAY['D','D','D','M','M','Y','Y','Y'] AS ref_age_cd
       , ARRAY[1,4,15,2,7,13,18,199] AS ref_age
       , ARRAY[9.1,9.8,5.4,5.5,7.9,5.1,4.8,4.8] AS ref_clow
       , ARRAY[27.1,27.8,16.4,15.8,15.9,11.1,10.8,10.8] AS ref_chigh
), refTable AS (
SELECT unnest(ref_age_cd) ref_age_cd
 , unnest(ref_age) ref_age
 , unnest(ref_clow) ref_clow
 , unnest(ref_chigh) ref_chigh
  FROM b
), res AS (
SELECT A.*, rt.*, ROW_NUMBER() OVER(PARTITION BY patient_nr ORDER BY ref_age DESC) AS rn
  FROM A
  LEFT JOIN refTable rt ON (rt.ref_age_cd = 'D' AND a.age_days > rt.ref_age)
                        OR (rt.ref_age_cd = 'M' AND a.age_months > rt.ref_age)
                        OR (rt.ref_age_cd = 'Y' AND a.age_years > rt.ref_age)
 )
 SELECT * 
   FROM res
  WHERE rn = 1
0
Usagi Miyamoto On

For a single patient, try something like this:

/* Creating test environment
CREATE TABLE refs (
  id serial NOT NULL,
  ref_age_cd character(1)[],
  ref_age integer[],
  ref_clow double precision[],
  ref_chigh double precision[],
  CONSTRAINT refs_pkey PRIMARY KEY (id)
);
INSERT INTO refs(ref_age_cd, ref_age, ref_clow, ref_chigh)
       VALUES ('{"D","D","D","M","M","Y","Y","Y"}',
               '{1,4,15,2,7,13,18,199}',
               '{9.1,9.8,5.4,5.5,7.9,5.1,4.8,4.8}',
               '{27.1,27.8,16.4,15.8,15.9,11.1,10.8,10.8}');
CREATE TABLE pats (
  id serial NOT NULL,
  name varchar(255) NOT NULL,
  age_years integer,
  age_months integer,
  age_days integer,
  CONSTRAINT pats_pkey PRIMARY KEY (id)
);
INSERT INTO pats
       VALUES (DEFAULT, 'newborn', NULL, NULL, 10),
              (DEFAULT, 'baby', NULL, 13, NULL),
              (DEFAULT, 'adult', 80, NULL, NULL);
*/

-- Replace filters here to select only one row...
WITH tt AS ( SELECT * FROM refs WHERE id = 1 )
SELECT w.*, ref_clow, ref_chigh
FROM ( SELECT row_number() OVER () AS nr, unnest AS ref_age_cd
       FROM UNNEST( (SELECT ref_age_cd FROM tt ) ), tt ) q1
JOIN ( SELECT row_number() OVER () AS nr, unnest AS ref_age
       FROM UNNEST( (SELECT ref_age FROM tt ) ), tt ) q2 USING ( nr )
JOIN ( SELECT row_number() OVER () AS nr, unnest AS ref_clow
       FROM UNNEST( (SELECT ref_clow FROM tt ) ), tt ) q3 USING ( nr )
JOIN ( SELECT row_number() OVER () AS nr, unnest AS ref_chigh
       FROM UNNEST( (SELECT ref_chigh FROM tt ) ), tt ) q4 USING ( nr )
JOIN ( SELECT id, name, age_years, age_months, age_days,
              CASE WHEN age_years IS NOT NULL THEN 'Y'
                   WHEN age_months IS NOT NULL THEN 'M'
                   WHEN age_days IS NOT NULL THEN 'D' END AS ref_age_cd,
              CASE WHEN age_years IS NOT NULL THEN age_years
                   WHEN age_months IS NOT NULL THEN age_months
                   WHEN age_days IS NOT NULL THEN age_days END AS age
       -- Replace filters here to select only one row...
       FROM pats WHERE id = 2
     ) w USING (ref_age_cd)
WHERE ref_age <= age
ORDER BY ref_age DESC
LIMIT 1;

Outputs:

2;"baby";<NULL>;13;<NULL>;"M";13;7.9;15.9