Column names not accessible in Exasol UDF

34 views Asked by At

Situation

I want to write an Exasol UDF that calculates scoring models with different parameters. It should also be possible to specify a model formula, e.g. Treat ~ Sex + Age + Income

Unfortunately, the column names of the input table are not accessible within the UDF. Although the official EXASOL documentation says so:

exa$meta$input_columns[]

Array including the following information: {name, type, sql_type, precision, scale, length}

The columns are simply numbered consecutively, as shown in the following reproducible example:

Reprex

--/
CREATE OR REPLACE R SET SCRIPT TEST_SCHEMA.TEST_COLUMNNAMES(...)  EMITS(COLNAMES CHAR(2000)) AS 

    run <- function(ctx) {
    
     ctx$next_row(NA) 
     numCols <- exa$meta$input_column_count
     inCols <- exa$meta$input_columns[1:numCols]
     
      ctx$emit(inCols)
    }
/

SELECT TEST_SCHEMA.TEST_COLUMNNAMES(CLUSTER_NAME, MEASURE_TIME, EVENT_TYPE, DBMS_VERSION, NODES, DB_RAM_SIZE, PARAMETERS) FROM EXA_STATISTICS.EXA_SYSTEM_EVENTS limit 10;

which results in:

COLNAMES
list(name = "0", type = "character", sql_type = "VARCHAR(128) UTF8", precision = NA, scale = NA, length = 128)
list(name = "1", type = "POSIXt", sql_type = "TIMESTAMP", precision = NA, scale = NA, length = NA)
list(name = "2", type = "character", sql_type = "VARCHAR(30) UTF8", precision = NA, scale = NA, length = 30)
list(name = "3", type = "character", sql_type = "VARCHAR(30) UTF8", precision = NA, scale = NA, length = 30)
list(name = "4", type = "integer", sql_type = "DECIMAL(4,0)", precision = 4, scale = 0, length = NA)
list(name = "5", type = "numeric", sql_type = "DECIMAL(10,1)", precision = 10, scale = 1, length = NA)
list(name = "6", type = "character", sql_type = "VARCHAR(100000) UTF8", precision = NA, scale = NA, length = 100000)

Question

Is there a way to access the column names of the input table in a UDF?

The example is an R-UDF, but I assume that this is also true for the other UDF scripting languages (Python, Java).

1

There are 1 answers

0
Alexander Lipatiev On

A simple test

create or replace python3 scalar script test.col_names(initial_column_name varchar(100)) emits (col_name varchar(100), col_type varchar(100)) as
def run(ctx):
    for x in exa.meta.input_columns:
        ctx.emit(x.name, x.sql_type)
;
with
sub as(
    select 'a' as new_name
)
SELECT
    test.col_names(s.new_name)
from
    sub s
;
COL_NAME           |COL_TYPE         |
-------------------+-----------------+
initial_column_name|VARCHAR(100) UTF8|

suggests that exa.meta.input_columns[] shows UDF parameters, not the names of the really passed arguments.

To be sure one might want to check the code in https://github.com/exasol/script-languages-release/tree/master. However, I doubt that runtime names could be accessible from inside the UDF. In particular, it is possible to call a UDF with constants like test.col_names('42') ...