azure data factory, expression builder for substring_index not working

75 views Asked by At

I am trying to build a return a value in azure data factory, pipeline expression builder, and its not working. From what I understand the engine behind it is apache sparx scala. if I look at the documentation for scala, substring_Index is one of the options. when I try and use substring_index, I get an error stating.... 'substring_index' is not a recognized function.

substringIndex('hello.com','.',1)

what am I not understanding here?

I've tried different variations of spellings for the function and nothing

screenshot

1

There are 1 answers

3
Rakesh Govindula On BEST ANSWER

Even though ADF runs on the Apache Spark Engine, it doesn't support all the functions of Spark. It has its own Expressions and functions, and currently, substring_index() is not supported in ADF expressions.

However, you can build an equivalent expression like below and use it as a workaround.

I have stored the input string 'www.apache.org' in a string variable mystring and count in an integer variable count.

Use this expression to get the desired results:

@if(
    greaterOrEquals(int(replace(string(variables('count')),'-','')),length(split(variables('mystring'),'.'))),
        variables('mystring'),
            if(greater(variables('count'),0),
                join(take(split(variables('mystring'),'.'),variables('count')),'.'),
                    join(skip(split(variables('mystring'),'.'),add(length(split(variables('mystring'),'.')),variables('count'))),'.')))

enter image description here

Result when count value is 2:

enter image description here

I have checked all the input test cases like negative count and count>count of matched chars, and it is giving expected results like substring_index() function in Spark.