UDF Exasol function with Python code do not work

58 views Asked by At

I would like to create a UDF for an Exasol database using Python code in order to calculate the auc score.

The SQL Script looks the following:

--/
CREATE OR REPLACE PYTHON3 SCALAR  SCRIPT TEST_SCHEMA.IDA_EVALUATE_RESULTS(FLAG_LEBEND INT, MASTER_PD_MITT DECIMAL(11,7))
    EMITS (AUC DECIMAL(11,7)) AS

import pandas as pd
from sklearn.metrics import roc_auc_score
import numpy as np
from scipy.stats import norm

def calculate_auc_gini(y_emp, y_proba):
    auc = roc_auc_score(y_emp, y_proba)
    return auc

def run(ctx):

    auc = roc_auc_score(ctx.FLAG_LEBEND, ctx.MASTER_PD_MITT)
    ctx.emit(auc)

/


SELECT TEST_SCHEMA.IDA_EVALUATE_RESULTS(FLAG_LEBEND, MASTER_PD_MITT) FROM (SELECT * FROM TEST_SCHEMA.ROE_TEST_TABLE_FULL WHERE ROWNUM <= 10000);

I tested the Python code with the same data in Python itself by importing the data into Python and it worked. However when I run the SQL script I get the following error:

[22002] VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-PYTHON-1002: F-UDF-CL-SL-PYTHON-1026: ExaUDFError: F-UDF-CL-SL-PYTHON-1114: Exception during run IDA_EVALUATE_RESULTS:12 run File "/usr/lib/python3/dist-packages/sklearn/metrics/ranking.py", line 277, in roc_auc_score sample_weight=sample_weight) File "/usr/lib/python3/dist-packages/sklearn/metrics/base.py", line 70, in _average_binary_sc ...

The prediction column (MASTER_PD_MITT) have values between 0-1 and the realiziation column FLAG_LEBEND have only 1 or zero values. Both classes are available in the selected data.

I dont know why the SQL Script is not working and the same script in Python is working.

1

There are 1 answers

0
MarkusN On

You're using the wrong type of UDF. "SCALAR" means that the function is applied on every line, i.e. "y_emp" and "y_proba" are not vectors but individual values.

You have to define UDF type as «SET». Now you have to iterate over all rows of the input table, collect the values of the columns "y_emp" and "y_proba" and then pass them to the aggregate function roc_auc_Score().