This site gives a bit of mathematical elaboration before we introduce sigmoid neurons (neurons with sigmoid activation function), namely about perceptrons. http://neuralnetworksanddeeplearning.com/chap1.html
It starts off by perceptrons and goes on to sigmoid neurons. All is well, but I can't seem to prove the second question "Sigmoid neurons simulating perceptrons, part II" later in the chapter. I have hard time believing that you can replace a network of perceptrons with a network of sigmoid neurons with biases and weights unchanged (One can easily construct a counter example here : take weights 17, -6, -3 for the third layer and one final neuron in the fourth layer where b = -3 and w = {17, -6} in w.x + b >= 0, for {1,0,0}(including bias x_0) the perceptron network gives 0 while sigmoid network can give 1).
Can anyone help me and tell what am I missing or where am I going wrong? Thank you.
No you can't, not with the weights unchanged. But sigmoids are continuous approximations of binary threshold units and it should be similar. The page says this:
Which is true. As you multiply all of the weights by large values, the tiny difference between sigmoid units and threshold units gets smaller and smaller. Very large inputs into a sigmoid always produce 0 or 1.