How do I reproduce a SGDClassifier with modified_huber loss?

Question

How do I reproduce a SGDClassifier with modified_huber loss?

63 views Asked by kekekekyle At 25 February 2024 at 18:57

I have a model defined like so:

rng = 42
model = Pipeline([
    ('scaler', RobustScaler()),
    ('feature', SelectKBest(k=42)),
    ('model', SGDClassifier(loss='modified_huber', shuffle=True, random_state=rng))
])

That when I train+predict in two separate program executions (one ad-hoc, another with a cron job) with the exact same inputs, I get different model weights, and thus, prediction results.

I noticed that 'hinge' loss is the only reproducible model with the exact same weights. What is it about the other loss functions that prevent them from being reproduced?

I've checked and double-checked that the inputs are the same, and verified with other loss functions.

Original Q&A

There are 1 answers

**kekekekyle** · Answer 1 · 2024-03-10T20:32:46+00:00

Ok, I've tracked it down. There were TINY differences between the X datasets. I.e. I had values like 12.799 vs 12.8, but only a handful of them (<10 instances in >3k rows, >200 columns). I didn't think this would have such a large domino effect on the resulting models.

Rounding all data to 2 decimal places resulted in the exact same models being produced.

TechQA.

How do I reproduce a SGDClassifier with modified_huber loss?

There are 1 answers

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in STATISTICS

Related Questions in LOSS

Popular Questions

Trending Questions