How to train continuous outputs(regression) in machine learning

1.4k views Asked by At

I want to train regression model(not classification) which outputs are continuous numbers.

Let's say I have input variable, X, which ranges between -70 to 70. And I have output variable, Y, which ranges between -5 to 5. X has 39 features and Y has 16 features and they have 50,000 examples each. Then I would like to train them with deep belief network(DBN) in python.

I used the script in theano homepage which described DBN with MNIST data(classification). http://deeplearning.net/tutorial/DBN.html

Could you tell me what specific lines that I have to change in order to apply regression problem which I explained above?

For example I have to change...

  1. sigmoid function to tanh function. : I heard tanh activation function works better than sigmoid in regression. right?
  2. instead of using negative-log likelihood, I have to use least-square error....?
  3. Input and output data normalization by zscore?

Please please let me know if you have any idea to solve this problem... Most of the machine learning examples are based on MNIST hand digit classification. I would be happy if you recommend me some nice blogs or homepage where I can get useful information related to regression.

Thank you advance.

1

There are 1 answers

1
Arvind Kumar On
@hyungwon yang: 

I haven't seen the python code but i think the following would be useful:

sigmoid function to tanh function: Not necessary, many publications use sigmoid for non linear regressions. To be frank, the choice is to be made from the type of data you have. I use sigmoid for many non linear models and it had worked for me.

least-square error: You can perform this by inbuilt Matlab function regress instead of confusing it with so many parameters.

Normalisations: You can do a min-max normalisation(refer Data Mining by Charu Agarwal), my implementation in Matlab is as follows:

%inputData: Is your N samples * of first feature, you have to iteratively do %this process for each and every feature.
function normOut = NormaliseAnyData (inputData)
denominator = (max(inputData) - min (inputData));
numerator = (inputData - min(inputData)); 
normOut = numerator/denominator;
end

Hope it helps. Let me know if you have further questions.