access to files in data lake via open method (abfss)

Question

access to files in data lake via open method (abfss)

78 views Asked by play_something_good At 21 March 2024 at 12:19

In the past used the mount point to read the files from the data lake using open. Now we dont want to do it anymore but use the external location path abfss

below code is not working. No such file or directory

with open('abfss://urlofcloudstorage/container/file.txt') as f:
data = f.read()

I just got aware that open method works onlu with local files and it cant read anything from abfss

what would be a solution to read the file from datalake. I have seen one option dbutils.fs.cp but I dont really want to copy the files locally. any advice?

UPDATE:I also tried dbutils.fs.cp but as Im using the shared access mode cluster, it is not supported

  def decrypt_csv_file_to_pandas(self, source_path, pgp_passphrase, csv_separator):
    """
    Decrypt a csv file directly into a pandas dataframe.
    """
    with open(source_path, 'rb') as f:
      decrypted = self.gpg.decrypt_file(
        file=f,
        passphrase= pgp_passphrase
      )
      print(decrypted.status)
      df_pd = pd.read_csv(io.StringIO(str(decrypted)) , sep=csv_separator, low_memory=False, keep_default_na=False)
      return df_pd

Original Q&A

There are 1 answers

**JayashankarGS** · Answer 1 · 2024-03-26T09:22:56+00:00

Install the Python package adlfs in the Databricks library tab or use the command below:

pip install adlfs

Then, use the following code:

from adlfs import AzureBlobFileSystem
key = "z9XY91xxxxxxxxxxxxxxxxxyyyyyyyyyyyy"
container_name = "data"
file_path = "pdf/titanic.csv"
abfs = AzureBlobFileSystem(account_name="jadls2", account_key=key)

with abfs.open(f"{container_name}/{file_path}", "r") as f:
    print(f.read())

Here, I have provided the key while configuring, but I would not recommend that. Instead, use a SAS token or service principal.

Check this for more information on arguments for different credentials.

Output:

enter image description here

I am only printing the file data. In your case, you should decrypt it and read it into a pandas DataFrame.

TechQA.

access to files in data lake via open method (abfss)

There are 1 answers

Related Questions in AZURE

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in AZURE-DATABRICKS

Related Questions in AZURE-DATA-LAKE-GEN2

Popular Questions

Trending Questions