Implementing hybrid Particle Swarm Optimization in CNNs? (And, general questions about the algorithms as a whole)

244 views Asked by At

So, I'm kind of new to working with Neural Networks (I use Keras w/ the TensorFlow backend). My background in math spans just deep enough to understand the concept behind Gradient Descent optimization. I'm not confident enough to work with numbers and symbolic math.

I was recently reading about PSO (another optimization technique called Particle Swarm Optimization). I've been building a CNN to classify lung disease types. So far, I've understood the following:

Gradient Decent:

  • Minimizes the cost function (finds a minimum of the cost function)
  • Starts at some randomly initialized position and looks for the steepest gradient
  • Cost function must be differentiable (slopes = gradient)
  • Usually settles down in one minimum which could be a local or global minimum

I understand Gradient Descent well but am confused on why PSO is a simpler approach. Here is what I know about PSO:

Particle Swarm Optimization:

  • Minimizes cost function
  • Multiple particles start at different locations on this cost function
  • Particles look for minimums but each particle is affected by the swarm
  • This means particles don't settle into a single local minimum and can move out of minimums based on swarm behavior
  • Improves the chance of finding a global minimum
  • Cost function DOES NOT have to be differentiable?
  1. Why does this make sense? If the particles (my understanding of a particle is an instance of a model with randomly initialized weights, etc, which means it has a different position on the cost function). This essentially makes more model instances to train vs. gradient descent which trains one. Correct my understanding of a particle if what I just said is utter nonsense...

  2. Why does the cost function not have to be differentiable? The particles are looking for a minimum and therefore need to go in direction of the steepest gradient downward.

  3. How can one implement PSO in a CNN? I was looking at a library called Pyswarms which left me further frustrated since Pyswarms doesn't seem to be usable as an optimizer for CNNs.

(P.S. I am visualizing a cost function as a 3 variable function).

0

There are 0 answers