I am trying to do a linear regression which takes some physical measurements and then predicts various Physical values. One of these Physical values is a complex number, so I am trying to split it into two and predict the magnitude and phase seperately.
I need to constrain the magnitude to between 0 and 1. In my case the only possible values of magnitude are between zero and one. Allowing the network to predict values out of this range unlocks a range of degenerate solutions and as such the network gets nowhere near true predictions.
To constrain this I have applied a sigmoid activation function to the output for the magnitude. This sometimes works. However, the true values for the magnitude are quite near zero (order 10^{-4}, which means they are near the tail of the sigmoid. This sometimes leads to the loss and gradients becoming NaN after 100 or so epochs. Is there a better approach to predicting complex numbers, or constraining outputs of regression problems?
I have tried lowering the learning rate, increasing batch size, and used various different archetectures but have had no luck. I have also tried solving the problem without constraining the output neuron with the sigmoid function but this makes the problem so degenerate that the network cannot get close to the correct solution. I have also tried transforming the magnitude to be larger (ie close to 0.5) and then transforming it back later.
For context, my loss function is
$$L = \frac{1}{n} \sum^n_i (f_{true} - f(a_{pred}, b_{pred}, |c_{pred}|, c_{pred}^{\phi})^2$$
where $f_{true})$ is something that we know, $a$ and $b$ are things that we are trying to predict and $c$ is the complex number that we are trying to predict.