access azure files using azure databricks pyspark

308 views Asked by At

I am trying to access a file which is Rds extension. I am using the below code however it is not helping.

import pandas as pd

url_sas_token = 'https://<my account name>.file.core.windows.net/test/test.rds?st=2020-01-27T10%3A16%3A12Z&se=2020-01-28T10%3A16%3A12Z&sp=rl&sv=2018-03-28&sr=f&sig=XXXXXXXXXXXXXXXXX'
# Directly read the file content from its url with sas token to get a pandas dataframe
pdf = pd.read_excel(url_sas_token )
# Then, to convert the pandas dataframe to a PySpark dataframe in Azure Databricks
df = spark.createDataFrame(pdf)
1

There are 1 answers

0
Bhavani On BEST ANSWER

I created storage account and created file share and uploaded rds file into file share. Image for reference:

I generated SAS key in storage account. Image for reference:

I installed azure file shares in data bricks using

pip install azure-storage-file 

I installed pyreadr package to load rds file using

pip install pyreadr

enter image description here

I tried to load the rds extension file in databrick using

from azure.storage.file import FilePermissions, FileService
from datetime import datetime, timedelta 
import pyreadr
from urllib.request import urlopen

url_sas_token="<File Service SAS URL>"

response = urlopen(url_sas_token)
content = response.read()
fhandle = open( 'counties.rds', 'wb')
fhandle.write(content)
fhandle.close()
result = pyreadr.read_r("counties.rds")
print(result)

In above code I have given File Service SAS URL at url_sas_token.

image for reference:

Above code loaded rds file data successfully. Image for reference:

In this way I accessed rds extension file which is in azure blob file share from data bricks.