PDAL info "request_payer"

84 views Asked by At

Maybe I am being dense, but is it possible to add a "request_payer" to a pdal info call to s3?

For example:

This address is behind a "request_payer" firewall on AWS: s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz

Ideally, I'd like to run command below and get the summary results:

pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz

I just don't know how to tell pdal info.

I am attempting to do this via Ubuntu command line; however, if someone knows how to do this via python, that would be much appreciated as well

I tried to set the AWS_REQUEST_PAYER=requester as an environment variable, but I am under the impression AWS S3 does not recognize AWS_REQUEST_PAYER as a valid var.

AWS_REQUEST_PAYER=requester pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz

I also explored boto3 and a possible alternative but I am not sure how it would help.

I did test pdal info --summary s3://usgs-lidar/Projects/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_1_2019/LAZ/USGS_LPC_CA_CarrHirzDeltaFires_2019_B19_10TDK0479244992.laz on a piece of data that did not require a pay request and got the expected result.

1

There are 1 answers

0
Ian On

The goal of the above was to only read the metadata information from a point cloud on S3. The data I wanted to read from sat in a request-payer bucket. I needed to do this operation 400k+ times and I wanted to avoid the need to download the entire file just for the metadata. My mistake was the assumption that a pdal info call to S3 would only retrieve the metadata; however, I now believe PDAL needs the entire file for this request, due to the structure on the LAZ. This negates the "quick" call to S3 for just the header information.

Workaround

I lowered my expectations on the information I could easily extract from the point cloud header and instead focused on getting the extents and point count. In doing so, I only need the first 10KB of data from the file. This allows me to perform a range request from 0-10000 Bytes and use any number of tools, s3fs, boto3, or awscli for the operation.

It is not pretty, but serves my needs

  1. Download the first 10kb
  2. Save a laz
  3. Parse header using laspy for point count and extents

Something like:

part_laz = s3.get_object(Bucket=BUCKET, Key=KEY, RequestPayer="requester", Range="bytes=0-10000")
body = part_laz["Body"].read()
with open(part_laz_file, "wb") as f:
     f.write(body)            
    
with laspy.open(part_laz_file) as plaz:  
     header = plaz.header