I built a GA algorithm to solve the TSP.
It's in an exercise in Norvig's book (AIAMA). He suggested we read the Larrañaga's view on the problem for representations and such. I learned the some cross over operators and mutations there and tried out a few of them to see which suited me better. I also elaborated a fitness function based on the cost of each path.
Well, despite all that I have a question about my algorithm design, I had never worked with AI before so I don't know if my approach is a valid one for GAs.
Here's what I did:
I took a vector of paths (my initial population)
and then at each loop I organized this vector by increasing cost order and took the best X paths in this vector and deleted the other paths.
I then apply XOver and Mutation (at a certain rate) and put them at the position of the old deleted values of the vector.
I then do the same (order vector - extract best etc)
and keep doing it until it stabilizes.
Is it a good approach ? If it isn't give me some north. Cause when I selected the best and didn't keep them (just created a whole new pop from them, via Xover and mutation) I didn't get good results. This way (keeping the best ones - in some experiments I kept only THE best one) I have better results and the result always converges and well fast
State representation : For state representation I chose to use the Path Representation (I was already using it from start and decided to stick with it after I read Larrañaga et all), it is as follows: if we have, let's say, 5 cities and visit the first city, then the second, then the third ... our path representation would be (1, 2, 3, 4, 5)
Initial population generation : Actually I'm generation random points to representing the cities (that's what the book asked me to do, generate random points in a closed square) - but for the sake of comparison I generated a population and stick with it when comparing the results - I think if I didn't stick with one particular population I wouldn't have much to know about my improvements
Best Fit Individuals : The best fit individuals are the one's that have the best traveling cost. (I don't know if I should have - in this problem - used something else as a parameter
crossover : I tried some crossover operators and compared my reuslts with the one presented in the book and ended up using one of the Order CrossOvers ( Larrañaga et al(1999) ): this crossover takes tow random cuts (forming a sub-path) fromboth parent paths and then copies the rest of the path (the cities not yet visited inside that sub-path) from the other parent (starting at the second cut, not at the position '0') - adding the cities they appear in the other parent.
example: paths (1 2 3 4 5) (3 4 2 1 5) choosing the sub-paths (234) and (421) we would have as offsprings (5 |2 3 4| 1) (5 |4 2 1| 3) <- the part inside the | | came from one parent and the other part from the other parent
mutation : For mutation I chose IVM (Inversion Mutation) approach. it takes a sub-path from the original path and inserts it back (inverted) at a random point.
example: path (1 2 3 4 5) sub path ( 2 3 4 )and random insertion after 5
generate : ( 1 5 | 4 3 2 )
First of all, avoid all these abbreviations (GA, TSP, XOver). It is hard to read and some people may have no idea what you are talking about. The first problem with genetic algorithm is How you choose the initial population, How you perform the crossover, How you perform the mutation. The second problem is that the naive understanding of the description may be a terrible one.
For you the best fit individuals are the ones which have already better cost. You can argue that it will be better to take the most diverse individuals, ie the ones which are more likely to explore different part of the problem space. Describe how do you make the following operations:
You also need to evaluate how is your solution improved with running time. For example, rather than
n
iterations your provide100n
, does the solution get better, how much? How similar to each other are the individuals of the last generation when the algorithm stops, etc.Another question, do you try to solve it for a fixed topology or a variable topology??
EDIT : You are using published algorithms so it doesn't seems the problem is on specific operations. For the fitness you are right stick with the path cost. You say
You should keep the best fit individuals and their children. It follow a wicked principle of the nature, Only the best have the right to reproduce. The one which have to be replaced are the least fit individuals.
There are 3 parameters you can tune : The proportion of best fit individual which have children (also the number of individual will be ditched out), the mutation probability, and the number of runs.
To test how your algorithm perform you can sample the best solution by iteration, ie each
t
iteration you save the lower cost. Once plotted it should look like :