Say I have this pseudo code in Spark SQL where t1 is a temp view built off of partitioned parquet files in HDFS and t2 is a small lookup file to filter the said temp view
select t1.*
from t1
where exists(select *
from t2
where t1.id=t2.id and
lower(t1.col) like lower(t2.pattern)) --to mimic ilike functionality
Will the optimizer treat lower(t1.col) like lower(t2.pattern)
as case insensitive match? Or will it run transformations on these columns before performing the match?
I don't have access to the DAG to see what exactly happens behind the scenes so I am asking here to see if this is a known/documented optimization trick.
I tried to reproduce that case using scala and then I called explain() to get the physical plan (I'm pretty sure sql and scala will have the same physical plan because behind the scene it's the same optimizer named “Catalyst”)
As you can see in the logical plan it will call lower each time to compare the 2 values: lower(_1#3) = lower(_2#4).
Btw I tried same thing joining 2 dataframe, then filtering on lower but I got the same result.
I hope this answer your question.