I work in a logistic department for a company, recently we have been trying to narrow down the amount of different packaging options that we use.
I have all the necessary product data like length, width, height, volume and also sales data.
So I was thinking if it is possible to use an algorithm to cluster the different volumes of the products and maybe also take into account which sizes are selling the most, to determine, which box sizes would be ideal. (Taking into account how often a product sells is secondary so that is not absolutely necessary)
What I want is that I can give the Algorithm an amount of how many different boxsizes I want and the algorithm should determine where to put the limits, so that there is a solution for every product that we have. With the goal of the optimization being minimum volume wasted while also not using more than the set amount of different boxes.
Also important to note, the orientation of the products and the amount per box is set, so there is no need to determine how to pack the products and how many go into one box idealy or something like that.
What kind of algorithms could be used for a problem like this and what are my options to program them? I was thinking of using Matlab, but would also be open for other possible options. I want to program it, not simply use an existing program like SPSS.
Thanks in advance and forgive me if my english is not the best, I'm not a native speaker.
 
                        
The following C++ program will find optimal solutions for small instances. For 10 input box sizes, each having dimensions randomly chosen in the range 1..100, and for any number 1..10 of box sizes to choose, it computes the answer in a couple of seconds on my computer. For 15 input box sizes, it takes around 10s. For 20 input box sizes, I could compute up to 4 chosen box sizes in about 3 minutes, with memory becoming an issue (it used around 3GB). I had to increase the linker's default stack size to avoid stack overflows.
Example input:
Example output:
I'll explain the algorithm behind this if there's interest. It's better than considering all possible combinations of k candidate box sizes, but not terribly efficient.