We have an files partitioned in the datalake and are using Azure Synapse SQL Serverless pool to query them using external tables before visualising in Power BI.
Files are stored in the following partition format {source}/{year}/{month}/{filename}_{date}.parquet
We then have an external table that loads all files for that source.
For all files that increment each day this is working great as we want all files to be included. However we have some integrations that we want to return only the latest file. (i.e. the latest file sent to us is the current state that we want to load into Power BI).
Is it possible in the external table statement to only return the latest file? Or do we have to add extra logic? We could load all the files in, and then filter for the latest filename and save that in a new location. Alternatively we could try to create an external table that changes every day.
Is there a better way to approach this?
You can use file metadata to query for the latest file but you will need to ensure your parquet files begin with date stamps:
Take a look at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-specific-files for more information on using file metadata in SQL Serverless queries.