How to read the function body of a Postgres standard-SQL function?

383 views Asked by At

In Postgres 14 a new kind of syntax was added to CREATE FUNCTION where the body of the function can be specified directly in SQL rather than being contained in a string. The string bodies of functions are easily accessible in pg_proc.prosrc, while the new function bodies appear to be stored in pg_proc.prosqlbody.

The prosqlbody column is of type pg_node_tree, which would normally get passed to pg_get_expr in order to turn it back into readable SQL. However, various attempts to call pg_get_expr on this column either return NULL or an error ('input is a query, not an expression'). Is this a bug, or is there some alternative to pg_get_expr that should be used in this case?

The only other alternative I can see is using pg_get_functiondef to get the entire CREATE FUNCTION statement, and then scraping past all of the initial function properties to find the function body. This would work, but it feels fragile and is more effort than I would be hoping for.

The context is that we're working on some code to generate update scripts between two Postgres databases - while it would be simpler to just work with the entire CREATE FUNCTION statement from pg_get_functiondef and stick that in a script like pg_dump does, when altering an existing function that may not be an option. So being able to access the function definition as individual parts helps a lot.

2

There are 2 answers

3
Erwin Brandstetter On BEST ANSWER

Advanced solution

There is a function pg_get_function_sqlbody(oid) in the source code. But it's not exposed in standard Postgres. If you have sufficient privileges (superuser does it) you can create a LANGUAGE internal function from it like this:

CREATE OR REPLACE FUNCTION pg_get_function_sqlbody(oid)
  RETURNS text
  LANGUAGE internal STABLE PARALLEL SAFE STRICT
AS 'pg_get_function_sqlbody';

It takes the OID of a function and does exactly what you are asking for: reconstructs (only) the sql_body of a standard-SQL function.
Any role can call this custom function like (among other ways):

SELECT pg_get_function_sqlbody('myschema.myfunc()'::regprocedure);

Tested in Postgres 15. Most probably works the same in Postgres 14.

Since the function is not documented, you cannot rely on compatibility across major versions. The project is free to drop or change it as needed (even if very unlikely).

Some explanation

Quoting the manual, pg_get_expr() only ...

Decompiles the internal form of an expression [...]

Bold emphasis mine.

The body of a standard-SQL function can either be a single statement:

RETURN expression

or a block:

BEGIN ATOMIC
  statement;
  statement;
  ...
  statement;
END

Either way, what's actually stored is a parse tree, not a simple expression. (Though either uses the data type pg_node_tree.) So pg_get_expr() cannot handle it, not even the "single statement" variant, and you get:

ERROR:  input is a query, not an expression 

(Or null when called with null as 2nd parameter.)

Related:

pg_get_functiondef() is your best option in standard Postgres. (Looks like a pretty good option to me.)

0
Morris de Oryx On

@ErwinBrandstetter, Thank you for this fantastic tip! I got the custom alias function up and running locally, no problem. On our RDS deploy, no superuser access, no joy.

But here's some good news: pg_get_function_sqlbody(old) works directly without the custom function on RDS. Here's a little script I'm working with while trying to move from 15.5 to 16.1.

/*
D'oh! Still failing on major version upgrades as pg_dump/pg_restore doesn't seem to handle our
dependency chains correctly. Mainly, if not exclusively, this is for our UPSERT system, were
each table+version has a custom view to implicitly define a compound type, and an UPSERT function
that takes my_table_v3[] input. Tom Lane put in a fix for this in PG 15, but I'm still not getting
simple upgrades. I have not put the time into building and testing a simplest proof database to
report the problem. I suck, stipulated.

Note that I'm only guessing that this is what's blocking my 15.5 -> 16.1 upgrades on RDS. During
the 14 -> 15 period, I uncovered some RDS bugs around this that they resolved, and they also agreed
to expose more logs. Unfortunately, the _details_ of why pg_dump or pg_upgrade have failed are
not visible. RDS generates an upgrade log that tells you the exact line and log file name to review,
but you have to open a ticket with RDS support to try and get at those details. At least, that's
how it was, and how it looks to be now.

Honestly, it's faster for me to drop everything, upgrade, and rebuild than to wait on RDS support.

*/

WITH 
begin_atomic_routines AS (

SELECT pronamespace::regnamespace               AS schema_name,
       proname                                  AS routine_name,
       pg_get_function_identity_arguments(oid)  AS argument_list,
       pg_get_function_sqlbody(oid)             AS stored_query -- This calls an undocumented internal function that's just what we need here. Thanks to Erwin Brandstetter!
              
  FROM pg_proc

-- https://www.postgresql.org/docs/current/catalog-pg-proc.html
 WHERE prosqlbody                        IS NOT NULL -- Uses BEGIN ATOMIC (SQL-standard block, first added in PG 14) instead of string literal.
   AND pronamespace::regnamespace::text       NOT IN ('pg_catalog','information_schema','extensions')
 ),
 
 -- It can help to have a squint at things first, including the reconstructed source query code.
 double_check_list AS (
  select * from begin_atomic_routines order by schema_name, routine_name, argument_list
),

drop_routines_list AS (
   SELECT 'DROP FUNCTION IF EXISTS ' || schema_name || '.' || routine_name || ' (' || argument_list || ');' AS drop_command
     FROM begin_atomic_routines
ORDER BY schema_name, routine_name, argument_list
)

-- Wrap these results in BEGIN....ROLLBACK to safely test for any lurking dependencies.
-- Then, you need to build a patch to manually restore all of the functions, after the upgrade.
-- select * from drop_routines_list   

 
 select * from drop_routines_list;

Later

I was writing this up for folks at work, and noticed that pg_get_functiondef() and pg_get_function_sqlbody() both work, at least with 15.5. The outputs aren't identical, pg_get_functiondef results include the basic CREATE OR REPLACE FUNCTION headers.

I'm guessing that this snuck into some dot release, or something. In any case, it gives me another option to think about.

select pg_get_functiondef      (11348905),
       pg_get_function_sqlbody (11348905);

regenerated code results

Regeneration function

Had a wander through more of pg_catalog and the system information functions, wrote up a routine to regenerate routines, base don details stored in the current database. I've turned up some anomalies in the generated code bodies, which are inconsistent across and within systems. Like, EXCLUDED. is changed to excluded_1 in a single UPSERT. Not tracking it down, not sure it's even something that I can track down. I'll use my source tree and manually build a patch to rebuild routines from there.

However, if you want to try and regenerate code, and only have SQL access to Postgres, this code has a bunch of bits and pieces that you may find useful. I enjoyed crawling around in there a bit...and now that itch is scratch.

------------------------------------
-- Function definition
------------------------------------
/*
The code below regenerates an existing routine, drawing all details from the database. 

Notes:
* This routine is wildly opinionated. 

* I've only tested this on functions, and only as far as I need it.

* I'm using ALTER ROUTINE, and similar, instead of ALTER FUNCTION/ALTER PROCEDURE.
  Why? Saves me from having to put in a bunch of branching code in a SQL CASE/switch
  off of pg_proc.prokind. Don't need it here. FYI, the current codes from the docs: 
  
     f for a normal function, 
     p for a procedure,
     a for an aggregate function
     w for a window function
  
* Postgres doesn't store in-code comments, but does support COMMENT ON.

* Language, volatility, SECURITY DEFINER, and parallel hints/contracts are all preserved. 

* The GRANTs here are _very_ unlikely to look like any explicit grants you've set, but they should match.

* Speaking of GRANTS, either I've over-complicated it, or it's over-complicated ;-) Nicer to stick 
  this bit into a function like regenerate_routine_grants(oid). Not doing that here.
  
* Speaking of functions, I'll set this query up as a function with $1 routine_oid oid as the argument list.

Bugs:
I found *one* function in my system where an ON CONFLICT was restored using excluded_1. instead of EXCLUDED.
This fails, don't know why this one particular case is translated out by pg_get_functiondef() and 
pg_get_function_sqlbody() both work this way. (Same code internally, presumably, for this bit.) And, yes,
this function uses BEGIN ATOMIC. As do 130+ other similar functions, all machine-generated.  So, test your
work with BEGIN.....all the things....ROLLBACK. Transactional DDL! Postgres _is_ amazing.

Corrections and Improvements:
...are most welcome.

*/

CREATE OR REPLACE FUNCTION dba.get_rebuild_routine_code(routine_oid_in oid)
  RETURNS text
  LANGUAGE sql 
  STRICT
 
AS $BODY$
 
WITH 

-------------------------------------------------------------------- 
-- pg_catalog views and functions are...cryptic. Pull out what's
-- needed and give each bit a simple label to make the later 
-- concatenation code more readable.
--------------------------------------------------------------------
routine_details AS (
  select pg_get_functiondef(oid)                   AS routine_body,     -- Works for BEGIN ATOMIC or string-literal function bodies.
         obj_description(oid, 'pg_proc' )          AS routine_comments, 
         pronamespace::regnamespace                AS schema_name,
         proname::text                             AS routine_name,
         pg_get_function_identity_arguments(oid)   AS argument_list,    -- This is what we need here, but see the docs for variants. 
         proowner::regrole::text                   AS owner_name,
         proacl                                    AS grants_list       -- Array of ACL items, we'll spend some time with this below.
                                 
    from pg_proc
   where oid = routine_oid_in
   -- (select 'tools.format_uuid(uuid)'::regprocedure) -- This will be $1 in a function. 
   --  Tip: Use the magic ::regprocedure casting to safely resolve a full routine path into its local OID:
   --       'public.example(uuid)'::regprocedure
          
), -- select * from routine_details


-------------------------------------------------------------------- 
-- GRANTS are stored in an ACL list (array). This takes more
-- unpacking, cleaning, and repacking than I know how to do
-- without a ton of code in SQL. Improvements welcome!
--------------------------------------------------------------------
/*
Too much unnesting to stick this in the CTE above, make two CTEs, one to unpack, the other to repack as GRANTs.
The unnested ACL list entries look like this sample:

group_api_users=X/user_bender

That's role=privilege/grantor. I think.

All this code does is strip off the =X/grantor_name bit. Routines only support the 'X' grant, so no more parsing needed. 

Note: There's either an easier way to reconstruct the GRANTS, or this gnarled up code should be hidden in a subroutine.
 
 Tip: Also see aclexplode() and the relevant bits and pieces in the docs.
*/

granted_role_names AS (
    select substring(unnest(grants_list)::text FROM '^([^=]+)=') as role_name
     from routine_details 
), -- select * from granted_role_names: This is a rowset with the roles in the ACL.

grant_role_line AS (
    -- GRANT EXECUTE ON FUNCTION dba.get_rebuild_routine_code(oid) TO rds_super;
   select 'GRANT EXECUTE ON FUNCTION '        || 
           schema_name || '.' || routine_name || 
           '(' || argument_list || ') TO '    || 
           role_name || ';' AS user_grant_code
           
        from routine_details 
cross join granted_role_names
  order by role_name
),

-- Collapse the rowset above down to a single text block.
combined_grants AS (
  select string_agg(user_grant_code, chr(10)) AS grants_block from grant_role_line
),--  select * from grant_line_code

-------------------------------------------------------------------- 
-- Put in a bit of conditional logic for elements that may be NULL,
-- add whitespace, etc.
--------------------------------------------------------------------
routine_components AS (
    
    SELECT 
    
    -----------------------------------------------------------------------
    -- Header
    -----------------------------------------------------------------------
     -- I'll be combining hundred(s) of routines, it's nice to have a visual break in the output at the top of each method.
        '--------------------------------------------------------------------------------'   || chr(10) ||   
        '-- ' || schema_name || '.' ||   routine_name || '(' || argument_list  || ')'        || chr(10) ||
        '--------------------------------------------------------------------------------'   || chr(10) ||   
        '-- Regenerated at: ' || now()                                         || chr(10)    || chr(10)      AS header,

    -----------------------------------------------------------------------
    -- CREATE OR REPLACE and full routine code
    -----------------------------------------------------------------------
    -- Note: This automatically includes LANGUAGE and any volatility level, 
    -- parallel promises, and SECURITY DEFINER settings.

    routine_body || ';' || chr(10)                                       AS create_or_replace_and_body,

    -----------------------------------------------------------------------
    -- COMMENT ON, if it's defined for this routine
    -----------------------------------------------------------------------
    /*
    COMMENT ON ROUTINE schema_name.routine_name(argument, list) IS
    $Comments here.$;
    */                                  
    CASE
    WHEN routine_comments IS NULL THEN ''
    
    ELSE chr(10) ||
     'COMMENT ON ROUTINE '                      ||
      schema_name                               || '.'       || routine_name ||
      '(' || argument_list|| ') IS '            || chr(10)   ||
      '''$' || REPLACE(routine_comments,'$','') || '$'';'  || chr(10)   || chr(10) -- The REPLACE is a total hack for _already_ dollar-quoted comments. YMMV.
     
    END                                                                  AS set_comments,
    
    -----------------------------------------------------------------------------
    -- ALTER FUNCTION...OWNER TO...
    -----------------------------------------------------------------------------
    /*
    ALTER FUNCTION dba.get_tables(text)
    OWNER TO user_bender;
    */
    'ALTER FUNCTION ' || 
    schema_name                        || '.'     || routine_name ||
    '(' || argument_list|| ')'         || chr(10) ||
    '    OWNER TO ' || owner_name      || ';'     || chr(10)             AS set_owner_name,
        
    -----------------------------------------------------------------------------
    -- GRANTs
    -----------------------------------------------------------------------------
    chr(10) || (SELECT grants_block FROM combined_grants) || chr(10)     AS grants_block

        
   FROM     routine_details    
) -- select * from routine_components

-------------------------------------------------------------------- 
-- We have arrived. Stitch everything together.
--------------------------------------------------------------------
select header                     ||
       create_or_replace_and_body || 
       set_comments               || 
       set_owner_name             ||
       grants_block                                                      AS rebuild_routine_code
       
  from routine_components
    
$BODY$;
  
 -----------------------------------------------------
 -- Comment
 -----------------------------------------------------
COMMENT ON FUNCTION dba.get_rebuild_routine_code(oid) IS
'Reconstruct a routine, based on the details stored in the current database.';

Here's an updated version of a sample script that builds out a bunch of lists and code blocks:

WITH 
begin_atomic_routines AS (

SELECT pronamespace::regnamespace               AS schema_name,
       proname                                  AS routine_name,
       pg_get_function_identity_arguments(oid)  AS argument_list, -- Better for ALTER FUNCTION, see https://www.postgresql.org/docs/current/functions-info.html
       pg_get_functiondef(oid)                  AS reconstructed_query,
       
       -- prokind char: f for a normal function, p for a procedure, a for an aggregate function, or w for a window function
       CASE prokind  -- This CASE syntax behaves like SWITCH in many languages.
           WHEN 'f' THEN 'Function'
           WHEN 'p' THEN 'Procedure'
           WHEN 'a' THEN 'Aggregate'
           WHEN 'a' THEN 'Window Function'
        ELSE
           'Unknown prokind char code of ' || prokind::text
        END AS routine_kind,
        
        get_rebuild_routine_code(oid) 
              
  FROM pg_proc

-- https://www.postgresql.org/docs/current/catalog-pg-proc.html
 WHERE prosqlbody                        IS NOT NULL -- Uses BEGIN ATOMIC (SQL-standard block, first added in PG 14) instead of string literal.
   AND pronamespace::regnamespace::text       NOT IN ('pg_catalog','information_schema','extensions')
 ),
 
 -- It can help to have a squint at things first, including the reconstructed source query code.
 double_check_list AS (
  select * from begin_atomic_routines order by schema_name, routine_name, argument_list
),

-- DROP ROUTINE? It's a generic for DROP FUNCTION or DROP PROCEDURE. All of our BEGIN ATOMIC code is implemented
-- as stored functions right now. But, you never know.
drop_routines_list AS (
   SELECT 'DROP ROUTINE IF EXISTS ' || schema_name || '.' || routine_name || ' (' || argument_list || ');' AS drop_command
     FROM begin_atomic_routines
ORDER BY schema_name, routine_name, argument_list
)

select * from double_check_list

-- Wrap these results in BEGIN....ROLLBACK; to safely test for any lurking dependencies.
-- Then, you need to build a patch to manually restore all of the functions, after the upgrade.
-- select * from drop_routines_list;