SQL Error [XX000]: ERROR: Spectrum Scan Error: DeltaManifest

916 views Asked by At

We have implemented delta lake but one issue as below: One table can be created and ingested, but after new data has been ingested, we will spectrum scan error:

SQL Error [XX000]: ERROR: Spectrum Scan Error: DeltaManifest Detail:

error: Spectrum Scan Error: DeltaManifest code: 15005 context: Error fetching Delta Lake manifest [tablenamexxx]/target/_symlink_format_manifest/active_ind=Y/creation_time=2022-05-10/manifest Message: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid EFHMMKBZ1EG5ZRJV,ExtRid p query: 4100335 location: scan_range_manager.cpp:1182 process: worker_thread [pid=9305]

By further checking, we found it was caused 2 partitions setup for this table. Once the new data ingested, it will expiry the old data, then move all data [creation_time=2022-05-10] to inactive_ind partition. _symlink_format_manifest has been updated corrrectly, no creation_time=2022-05-10 & active_ind='Y', but partations in Glue Catalog still keep that. That's causing the error.

If we manually run below script to drop the partition, it back to normal. Wondering why and how to resolve it? ALTER TABLE XXX DROP PARTITION (creation_time = '2022-05-10', active_ind = 'Y');

1

There are 1 answers

0
Richard Haussmann On

I had this same error. There weren't any partitions the error referred to in either the _symlink_format_manifest or the partition list. Rerunning the Glue crawler didn't do anything either.

The problem was that there was more than just the one partition and running a query to find the next 'bad' partition was inefficient. To get the full list:

select schemaname, tablename, values, location
from svv_external_partitions
where tablename = 'tablename';

Compare the results to the partitions seen in S3 and delete the list not appearing in S3 using your command:

ALTER TABLE XXX DROP PARTITION (partition_field='2022-10-17')

(dropping the active_ind=Y) command for each.