How do you inspect candidate logical plans of cost-based SQL optimizer in spark (scala)?

40 views Asked by At

For a project, I want to find a way to select the top-K resolved logical plans given a SQL query in spark, based on a cost-based optimizer. Is anyone aware of a spark SQL cost-based optimizer that computes some candidate plans where I could choose the top-K plans based on some expected cost measure?

I am aware that I can get multiple physical plans with the common catalyst optimizer, but I want to specifically be able to choose from logical plans. Since the catalyst optimizer is rule-based, this does not seem possible.

I looked into the calcite optimizer (https://calcite.apache.org/) but I could not find a way to provide the candidate plans.

An example of desired output for query: SELECT c.Name, b.Name FROM banks b, customers c WHERE c.bankId=b.Id and b.city = "New York" would be for instance two logical plans where plan A would have the join before filtering on cities and plan B the join after filtering on cities.

My preference would be to do this in Spark SQL with another optimizer than Catalyst (or even to do it with catalyst). If there are other suggestions of optimizers in other languages that allow this, this would be appreciated!

0

There are 0 answers