Fill Pandas Column NaNs with numpy array values

2.4k views Asked by At

Sorry if this question seems too for newbies but I've been looking for an answer I didn't find it.

So, I have a dataset with lots of NaN values and I've been working on some regressions to predict those nulls, and since the prediction is given as a numpy.ndarray, I've trying to fill the gaps of the columns with those arrays with no success.

I mean, the column is something like this:

           ['Records']
      101       21
      102       22
      103       23 
      104       24
      106       NaN
      107       NaN
      108       NaN
      109       NaN
      110       NaN
      111       29
      112       30

The array is:

   y_pred = [25, 26, 27, 28]

So, fillna doesn't handle numpy arrays to do the job, and my attempts were set the array as dict, pandas column, etc. but nothing worked.

Also, the other issue is the lenght of the array which always will be different from the original column.

I appreciate your insights.

1

There are 1 answers

2
jezrael On BEST ANSWER

First is necessary same number of missing values like length of array, if want replace all missing values by all values of array:

#added value
y_pred = [25, 26, 27, 28, 30]
m = df['Records'].isna()

df.loc[m, 'Records'] = y_pred
print (df)
     Records
101     21.0
102     22.0
103     23.0
104     24.0
106     25.0
107     26.0
108     27.0
109     28.0
110     30.0
111     29.0
112     30.0

If is possible length not matched create helper Series with filter by lengths and pass to Series.fillna:

Here array has length < number of NaNs:

y_pred = [25, 26, 27, 28]

m = df['Records'].isna()

LenNaN = m.sum()
LenArr = len(y_pred)

s = pd.Series(y_pred[:LenNaN], index=df.index[m][:LenArr])
print (s)
106    25
107    26
108    27
109    28
dtype: int64

df['Records'] = df['Records'].fillna(s)
print (df)
     Records
101     21.0
102     22.0
103     23.0
104     24.0
106     25.0
107     26.0
108     27.0
109     28.0
110      NaN
111     29.0
112     30.0

Here array has length > number of NaNs:

y_pred = [25, 26, 27, 28, 100, 200, 300]

m = df['Records'].isna()

LenNaN = m.sum()
LenArr = len(y_pred)

s = pd.Series(y_pred[:LenNaN], index=df.index[m][:LenArr])
print (s)
106     25
107     26
108     27
109     28
110    100
dtype: int64

df['Records'] = df['Records'].fillna(s)
print (df)
     Records
101     21.0
102     22.0
103     23.0
104     24.0
106     25.0
107     26.0
108     27.0
109     28.0
110    100.0
111     29.0
112     30.0