In the past used the mount point to read the files from the data lake using open. Now we dont want to do it anymore but use the external location path abfss
below code is not working. No such file or directory
with open('abfss://urlofcloudstorage/container/file.txt') as f:
data = f.read()
I just got aware that open method works onlu with local files and it cant read anything from abfss
what would be a solution to read the file from datalake. I have seen one option dbutils.fs.cp but I dont really want to copy the files locally. any advice?
UPDATE:I also tried dbutils.fs.cp but as Im using the shared access mode cluster, it is not supported
def decrypt_csv_file_to_pandas(self, source_path, pgp_passphrase, csv_separator):
"""
Decrypt a csv file directly into a pandas dataframe.
"""
with open(source_path, 'rb') as f:
decrypted = self.gpg.decrypt_file(
file=f,
passphrase= pgp_passphrase
)
print(decrypted.status)
df_pd = pd.read_csv(io.StringIO(str(decrypted)) , sep=csv_separator, low_memory=False, keep_default_na=False)
return df_pd
Install the Python package
adlfsin the Databricks library tab or use the command below:pip install adlfsThen, use the following code:
Here, I have provided the key while configuring, but I would not recommend that. Instead, use a
SAStoken or service principal.Check this for more information on arguments for different credentials.
Output:
I am only printing the file data. In your case, you should decrypt it and read it into a pandas DataFrame.