Joining a subquery on composite keys with multi-million record tables

273 views Asked by At

I've got some very very large tables - one is around 200,000,000 (scans) and the other is around 50,000,000 (piece details). They are often joined on a set of matching columns - specifically USPS barcode data (zip, zip_4, zip_delivery_point, serial_number). Since it's typically the piece detail record locating matching scans, I wanted to partition the scan records by the lookup criteria (the piece detail records are partitioned by a different set of keys representing a mailing). This led me to move from an autoincrementing PK to a composite of zip, zip_4, zip_delivery_point, serial_number, scan_date_time.

The issue that I'm now running into is optimizing some of my large reporting queries. I usually run some scripts to process the data that includes finding the first scan date for each mail piece, the latest scan date for each mail piece, calculating delivery dates, etc. I often find myself needing to perform a join like this:

SELECT a, b, c, s1.first, s2.latest
FROM (subquery to grab MIN() scan record) s1
JOIN _scan s2 ON some_id = (subquery to grab MAX() scan record using fields from s1)

Since I don't have a nice single col for my PK, I'm not able to perform a join on records in this way unless I concatenate my composite key values either on the fly or up front to sort of fake a single column key. This seems pretty redundant, though.

What's a good way to maintain the physical order of the records sorted by my current composite key while also being able to join on the results of a subquery that can utilize a previous subquery (s1 and s2, in my example)?

Note: I can currently accomplish correct results by replacing the join table with a subquery and then joining on that alias, but mysql seems to want to load the entire table into memory before it performs the join this way.

TABLE STRUCTURES

CREATE TABLE `_scan` (
  `facility_zip` varchar(5) NOT NULL,
  `operation_code` varchar(3) NOT NULL,
  `scan_date_time` datetime NOT NULL,
  `zip` varchar(5) NOT NULL DEFAULT '',
  `zip_4` varchar(4) NOT NULL,
  `zip_delivery_point` varchar(2) NOT NULL,
  `barcode_identifier` varchar(2) NOT NULL,
  `service_type` varchar(3) NOT NULL,
  `mailer_id` varchar(9) NOT NULL,
  `serial_number` varchar(9) NOT NULL,
  `new` tinyint(1) NOT NULL DEFAULT '1',
  `first_scan` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`zip`,`zip_4`,`zip_delivery_point`,`serial_number`,`scan_date_time`),
  KEY `operation_code` (`operation_code`)
) ENGINE=InnoDB
/*!50100 PARTITION BY KEY (zip,zip_4,zip_delivery_point,serial_number,scan_date_time)
PARTITIONS 10 */;

CREATE TABLE `_piece_detail` (
  `job_id` varchar(8) NOT NULL,
  `piece_id` bigint(20) NOT NULL,
  `cqt_database_id` int(11) NOT NULL,
  `package_id` int(11) NOT NULL,
  `barcode_identifier` int(11) DEFAULT NULL,
  `service_type` int(11) DEFAULT NULL,
  `mailer_id` varchar(9) DEFAULT NULL,
  `serial_number` varchar(9) NOT NULL,
  `zip` varchar(5) NOT NULL,
  `zip_4` varchar(4) NOT NULL,
  `zip_delivery_point` varchar(2) NOT NULL,
  `est_delivery_date` date DEFAULT NULL,
  `first_scan_date_time` datetime DEFAULT NULL,
  `latest_scan_date_time` datetime DEFAULT NULL,
  PRIMARY KEY (`job_id`,`piece_id`),
  KEY `cqt_database_id` (`cqt_database_id`) USING BTREE,
  KEY `est_delivery_date` (`est_delivery_date`),
  KEY `zip` (`zip`,`zip_4`,`zip_delivery_point`,`serial_number`),
  KEY `first_scan_date_time` (`first_scan_date_time`),
  KEY `latest_scan_date_time` (`latest_scan_date_time`)
) ENGINE=InnoDB
/*!50100 PARTITION BY KEY (job_id,piece_id)
PARTITIONS 10 */;
0

There are 0 answers