Maybe I am being dense, but is it possible to add a "request_payer" to a pdal info call to s3?
For example:
This address is behind a "request_payer" firewall on AWS: s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz
Ideally, I'd like to run command below and get the summary results:
pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz
I just don't know how to tell pdal info
.
I am attempting to do this via Ubuntu command line; however, if someone knows how to do this via python, that would be much appreciated as well
I tried to set the AWS_REQUEST_PAYER=requester
as an environment variable, but I am under the impression AWS S3 does not recognize AWS_REQUEST_PAYER
as a valid var.
AWS_REQUEST_PAYER=requester pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz
I also explored boto3
and a possible alternative but I am not sure how it would help.
I did test pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz
on a piece of data that did not require a pay request and got the expected result.
The goal of the above was to only read the
metadata
information from a point cloud on S3. The data I wanted to read from sat in arequest-payer
bucket. I needed to do this operation 400k+ times and I wanted to avoid the need to download the entire file just for the metadata. My mistake was the assumption that apdal info
call to S3 would only retrieve themetadata
; however, I now believe PDAL needs the entire file for this request, due to the structure on the LAZ. This negates the "quick" call to S3 for just the header information.Workaround
I lowered my expectations on the information I could easily extract from the point cloud header and instead focused on getting the extents and point count. In doing so, I only need the first 10KB of data from the file. This allows me to perform a range request from 0-10000 Bytes and use any number of tools,
s3fs, boto3, or awscli
for the operation.It is not pretty, but serves my needs
laspy
for point count and extentsSomething like: