I've two source dataframes:

Storeorder: {columns=Store, Type_of_carriers, No_of_carriers, Total_space_required}
Fleetplanner: {columns=Store, Truck_Type, Truck_space, Route}

Requirement is:

  1. Create list with {Store, Type_of_carriers, No_of_carriers, Route}

  2. In Fleetplanner data, one Store can have more than one Truck_type and Route. Also one Route can have multiple Stores or stops associate.

  3. Each time I take a record from Storeorder, I've to assign how many carriers will go to which route.
  4. At the same time I've to update Fleetplanner data with the space left for next stores.

This I've done in Pandas using loop and it is taking huge time.

Can anyone please suggest how to resolve this problem in alternate way in Spark?

I've solved the problem using Pandas, but want to parallelize in Spark


0 Answers