How to use databricks dbx with an Azure VPN?

280 views Asked by At

I am using dbx to deploy and launch jobs on ephemeral clusters on Databricks. I have initialized the cicd-sample-project and connected to a fresh empty Databricks Free trial environment and everything works (this means, that I can successfully deploy the python package with this command python -m dbx deploy cicd-sample-project-sample-etl --assets-only and execute it through the python -m dbx launch cicd-sample-project-sample-etl --from-assets --trace

When I try to launch the same exact job on my company's Databricks environment the deploy command goes through. The only difference is that my company's Databricks environment connects to Azure through a VPN.

Therefore, I added some rules to my firewall firewall_rules_img firewall_rules_2_img

but when I give the dbx launch command I get the following error error_node_img and in the log this message appears

WARN MetastoreMonitor: Failed to connect to the metastore InternalMysqlMetastore(DbMetastoreConfig{host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port=3306, dbName=organization5367007285973203, user=[REDACTED]}). (timeSinceLastSuccess=0) org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLTransientConnectionException: metastore-monitor - Connection is not available, request timed out after 15090ms. at org.skife.jdbi.v2.DBI.open(DBI.java:230)

I am not even trying to write on the metastore, I am just logging some data:

from cicd_sample_project.common import Task


class SampleETLTask(Task):
    def launch(self):
        self.logger.info("Launching sample ETL task")
        self.logger.info("Sample ETL task finished!")


def entrypoint():  # pragma: no cover
    task = SampleETLTask()
    task.launch()


if __name__ == "__main__":
    entrypoint()

Does someone encountered the same problem? Where you able to use Databricks-dbx with an Azure VPN? Please let me know and thanks for your help.

PS: If needed I can provide the full log

1

There are 1 answers

1
Alex Ott On

In your case the egress traffic isn't configured correctly - it's not the DBX problem, but general Databricks networking problem. Just make sure that outgoing traffic is allowed to the ports and destinations described in the documentation.