I have an agent-based model about simulating the parcel delivery using 7 trucks. Trucks are all standing-by at the warehouse. I can manually manipulate the departure time of each truck with different load utilization and evaluate the performance(see Fig.1).
However, I am seeking for a more optimized way to schedule the departure time per each truck so that I could minimize the late delivery of parcels and maximize the earlier delivery than promise. The learned result could be something like below (see Fig.2) where the model can smartly figure out the best appropriate time for sending out each trucks. Note: the truck departure time directly affect the parcel delivery on-time performance.
I understand that each truck can have a Q-table containing all the departure-time options. However, I am not sure how to link the final result (which is a global result of all trucks i.e. total delay of parcels and total early delivered parcels) to each truck's individual learning and Q-value update. Could anyone explain how to do in this specific case (it seems that there needs to be a coordination between each truck to drive better overall performance)? Are there any other search methods good for consideration?