Calling MinMaxScaler differs between same sets

25 views Asked by At

I have scaled my dataset using the MinMaxScaler form sklearn like this:

from sklearn.preprocessing import MinMaxScaler

# create a StandardScaler object
self.scaler = MinMaxScaler(feature_range=(0, 1))

# fit the scaler to the dataset
self.scaler.fit(self.X_org)

# transform dataset using the scaler
self.X_scalled = pd.DataFrame(self.scaler.transform(self.X_org), columns=self.X_org.columns)

return self.X_scalled

However, I am now using the last 10% of the entire dataset for a validation run also scaling the data with the scaler from the training dataset like so:

X_input_val_data_scalled = pd.DataFrame(self.scaler.transform(X_input_val_data), columns=X_input_val_data.columns)

Now my challenge:

In the training X_org set I get a nicely scaled dataset from 0 to 1. In the scaled validation X dataset I get completely wired data ranging from 7.5 to 8...

What am I doing wrong?

1

There are 1 answers

2
Taha Akbari On

That's is acutally how it is supposed to be a min-max scaler does the scaling as below:

data - min(x) / (max(x) - min(x))

where x is the data where the min-max-scaler is trained on. If data belongs to x then data - min(x) is a positive number smaller than max(x) - min(x) hence the ratio will lie between 0 and 1 but otherwise which is the case in you validation data the ratio doesn't have to be between 0 and 1.