Writing blob content into Microsoft Fabric using python/pyspark notebooks

106 views Asked by At

My existing solution was reading base64 string and writing it as file into blob storage

# Initialize Azure Blob Service Client
  connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxxx;AccountKey=xxxxxxx;EndpointSuffix=core.windows.net"  # Replace with your connection string
  container_name = "sandpit/Attachments"  # Replace with your container name
  blob_service_client = BlobServiceClient.from_connection_string(connection_string)

  def write_file_to_blob(data, filename):
     blob_client = blob_service_client.get_blob_client(container=container_name,  blob=filename)
     blob_client.upload_blob(data, overwrite=True)

   # UDF to decode base64
   def decode_base64(base64_str):
      return base64.b64decode(base64_str)

   # Register UDF
   decode_udf = udf(decode_base64, BinaryType())

and was calling above as

    collected_data = df_with_decoded_data.collect()

# Write each file to blob storage
for row in collected_data:
    write_file_to_blob(row['DecodedData'], row['FinalFileName'])

Now i wanted to move this to Onelake and what is the way to establish the connection to onelake files/folder and perform this task

What sort of credentials are there for Onelake to be passed as?

1

There are 1 answers

0
Sreedhar On

I managed to get it functioning because the user account running the notebook had complete access to the /Files directory. Since this was a one-time task, I didn't proceed to integrate it with a Service Principal or Managed Identity.

The following code worked well for me:

collected_data = df_with_decoded_data.collect()

base_dir = "/lakehouse/default/Files/attachments"  # File API Path
# Ensure the base directory exists
os.makedirs(base_dir, exist_ok=True)

# Write the decoded bytes to the file
for item in collected_data:
# Construct the filename using AttachmentId and FileName
filename = item["AttachmentId"] + item["FileName"][-4:]
# Full path for the file
file_path = os.path.join(base_dir, filename)

# Ensure the directory for the file exists (in case the filename includes subdirectories)
os.makedirs(os.path.dirname(file_path), exist_ok=True)

# Write the Body content to the file
with open(file_path, "wb") as file:
    file.write(item["DecodedData"])