Matching Arrival & Departures from long GPS data (long data to wide data)

25 views Asked by At

I have a large dataset of GPS coordinates for vessels. The dataset has over 8MM records and to try and simplify, I've focused on a specific vessel to test out the logic if this would work.

If I run a quick summary of records for a given vessel using the following MySQL query in a subset table of the whole dataset:

    SELECT equipment_id,equipment_name,geolocation,unlocode,
    status,when_created AS event_date,COUNT(*)
    FROM gpsdata_20231205_AD 
    WHERE equipment_name = 'CMA CGM AMBER' 
    GROUP BY equipment_name,geolocation,status 
    ORDER BY when_created,status;

I get the following output:

equipment_id equipment_name geolocation unlocode status event_date COUNT(*)
215148000 CMA CGM AMBER MANILA PHMNL Arrival 10/4/2023 2:03 5
215148000 CMA CGM AMBER MANILA PHMNL Departure 10/5/2023 5:16 1
215148000 CMA CGM AMBER SHANGHAI CNSHA Arrival 10/9/2023 11:06 4
215148000 CMA CGM AMBER SHANGHAI CNSHA Departure 10/10/2023 0:48 5
215148000 CMA CGM AMBER CAI MEP NULL Arrival 11/30/2023 20:06 4
215148000 CMA CGM AMBER CAI MEP NULL Departure 12/1/2023 15:13 3
215148000 CMA CGM AMBER PORT KLANG ANCH NULL Arrival 12/3/2023 15:29 3
215148000 CMA CGM AMBER PORT KLANG ANCH NULL Departure 12/4/2023 5:15 3
215148000 CMA CGM AMBER PORT KLANG NULL Arrival 12/4/2023 5:43 3
215148000 CMA CGM AMBER PORT KLANG NULL Departure 12/5/2023 6:04 2

The above query result should give some insight on what the dataset looks like.

The desired output I'm attempting to achieve is to match equipment_id,equipment_name,geolocation and get arrival and departure status & event date for a given geolocation.

So my desired output, using the first 2 geolocations as examples would look something like:

equipment_id equipment_name geolocation unlocode a_status a_event_date d_status d_event_date
215148000 CMA CGM AMBER MANILA PHMNL Arrival 10/4/2023 2:03 Departure 10/5/2023 5:16
215148000 CMA CGM AMBER SHANGHAI CNSHA Arrival 10/9/2023 11:06 Departure 10/10/2023 0:48

I believe that I have the general concept on how to build the query, I'm having issues on how to build the logic that ensures I'm getting the correct event dates (given, that the dataset has multiple records for the same equipment_id,equipment_name,geolocation & status.) ** The closest to my desired output was:**

SELECT g1.equipment_id,g1.equipment_name,g1.geolocation,
g1.unlocode,g1.status,g1.when_created AS a_event_date,
g2.status,g2.when_created AS d_event_date,
DATEDIFF(g2.when_created,g1.when_created) AS event_delta
FROM gpsdata_20231205_AD AS g1 
JOIN gpsdata_20231205_AD AS g2 ON(g1.equipment_id = g2.equipment_id AND g2.status = 'Departure')
WHERE g1.status = 'Arrival' 
AND g1.geolocation NOT LIKE '%ANCH%'
AND g1.equipment_name ='CMA CGM AMBER'
AND g1.when_created < g2.when_created
GROUP BY g1.equipment_id,g1.equipment_name,g1.geolocation
ORDER BY g1.when_created,g2.when_created;

This is the output for the above query:

equipment_id equipment_name geolocation unlocode status a_event_date status d_event_date event_delta next_geoloc next_unloc
215148000 CMA CGM AMBER MANILA PHMNL Arrival 10/4/2023 2:03 Departure 10/5/2023 5:16 1 SHANGHAI CNSHA
215148000 CMA CGM AMBER SHANGHAI CNSHA Arrival 10/9/2023 11:06 Departure 10/10/2023 0:48 1 CJK NULL
215148000 CMA CGM AMBER CJK NULL Arrival 10/10/2023 6:03 Departure 10/16/2023 2:47 6 CAI MEP NULL
215148000 CMA CGM AMBER CAI MEP NULL Arrival 11/30/2023 20:06 Departure 12/1/2023 15:13 1 PORT KLANG NULL
215148000 CMA CGM AMBER PORT KLANG NULL Arrival 12/4/2023 5:43 Departure 12/5/2023 6:04 1 TBD TBD

Given the time spent working with this data, I believe at this points I have tunnel vision where I can't see if my approach is correct. Not quite sure how I can ensure I'll get the "MIN" when created date for a_event_date (earliest/ 1st date of arrival) and "MAX" for d_event_date (latest/last date for departure).

If the above query is correct and there isn't a flaw, how would I incorporate the two last columns, next_geoloc & next_unloc? I have no idea how to approach that.

Any feedback is welcome.

Preemptively, thank you for your time.

TB.

0

There are 0 answers