Read shapefile from Azure Blob Storage in Azure Machine Learning

Question

Read shapefile from Azure Blob Storage in Azure Machine Learning

90 views Asked by Vanaclocha At 27 February 2024 at 17:24

I have several shapefiles stored in Azure Blob Storage within my Azure Machine Learning workspace, each comprising the files: file.fix, file.shp, file.dbf, file.prj, and file.shx. I need to directly access and read these shapefiles within my Azure Machine Learning environment.

So far, I've successfully read Parquet files using the following code:

Dataset.Tabular.from_parquet_files(path=[(datastore, file_path)]).to_pandas_dataframe()

and CSV files using:

table = Dataset.Tabular.from_delimited_files(path=[(datastore, file_path)]).to_pandas_dataframe()

While I did come across a solution for reading shapefiles in Azure Databricks, I haven't found a direct method for accomplishing this within Azure Machine Learning.

I understand that one workaround could be to download the files and read them locally within the code. However, I'm unsure about the implementation details for this approach.

Any help would be greatly appreciated.

Original Q&A

There are 1 answers

**JayashankarGS** · Accepted Answer · 2024-02-29T06:16:08+00:00

You can follow the approach below.

Code to download all the required files:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import os

def download_blob_folder(blob_service_client, container_name, folder_name, local_directory):
    container_client = blob_service_client.get_container_client(container_name)
    blobs_list = container_client.list_blobs(name_starts_with=folder_name)

    for blob in blobs_list:
        blob_client = container_client.get_blob_client(blob)
        blob_relative_path = blob.name[len(folder_name)+1:]
        local_file_path = os.path.join(local_directory, blob_relative_path)

        os.makedirs(os.path.dirname(local_file_path), exist_ok=True)

        with open(local_file_path, "wb") as file:
            download_stream = blob_client.download_blob()
            file.write(download_stream.read())
        print(blob_relative_path,local_directory)
        print(f"Blob '{blob.name}' downloaded to '{local_file_path}'")


connection_string = "DefaultEndpointsProtocol=https;AccountName=xxxx;AccountKey=xxxxxxxxxxxxxxxsaaaaaaaaaaaaaaaaaaadddddddcore.windows.net"
container_name = "sample"
folder_name = "map"


blob_service_client = BlobServiceClient.from_connection_string(connection_string)


local_directory = "./downloaded_files/"

download_blob_folder(blob_service_client, container_name, folder_name, local_directory)

Here, I am downloading all the files required to read a shapefile from the folder map. Make sure you have only files required to read in that folder because the above code downloads all the files present in the given folder.

After downloading, install geopandas through pip:

pip install geopandas

Code to read:

import geopandas as gpd

file_name="./downloaded_files/gadm41_IND_0.shp"
data = gpd.read_file(file_name)
data

Output:

enter image description here

TechQA.

Read shapefile from Azure Blob Storage in Azure Machine Learning

There are 1 answers

Related Questions in SHAPEFILE

Related Questions in AZURE-MACHINE-LEARNING-SERVICE

Related Questions in AZUREML-PYTHON-SDK

Related Questions in AZUREMLSDK

Related Questions in AZURE-ML-COMPONENT

Popular Questions

Popular Tags

Trending Questions