I have several shapefiles stored in Azure Blob Storage within my Azure Machine Learning workspace, each comprising the files: file.fix, file.shp, file.dbf, file.prj, and file.shx. I need to directly access and read these shapefiles within my Azure Machine Learning environment.
So far, I've successfully read Parquet files using the following code:
Dataset.Tabular.from_parquet_files(path=[(datastore, file_path)]).to_pandas_dataframe()
and CSV files using:
table = Dataset.Tabular.from_delimited_files(path=[(datastore, file_path)]).to_pandas_dataframe()
While I did come across a solution for reading shapefiles in Azure Databricks, I haven't found a direct method for accomplishing this within Azure Machine Learning.
I understand that one workaround could be to download the files and read them locally within the code. However, I'm unsure about the implementation details for this approach.
Any help would be greatly appreciated.
You can follow the approach below.
Code to download all the required files:
Here, I am downloading all the files required to read a shapefile from the folder
map
. Make sure you have only files required to read in that folder because the above code downloads all the files present in the given folder.After downloading, install geopandas through pip:
pip install geopandas
Code to read:
Output: