Pandas - Drop NaN's per column and pad with 0 fast?

Question

Pandas - Drop NaN's per column and pad with 0 fast?

173 views Asked by Stefan At 12 September 2020 at 11:55

I have a data frame such as the following with tens of thousands of rows and a thousand columns:

For an LSTM, I would like to

extract the values only per column,
put them together at the beginning of the dataframe and
pad 0 before the values start up to index 99.

However, please note, not every column has the same amount of values. Some have already a lot, some have none yet. Also on which timestamp the values are generated is column specific. I did achieve the result with the following code. However, since the code is really slow (700 hours), I am looking for a possibility to execute the calculation logic faster. It takes so long, since I want to calculate this kind of result for each hourly timestamps from 2008 to 2020.

Is there any way to make the code significantly faster?

df1=pd.DataFrame(index=range(100),columns=dummydata.columns)
for j in dummydata.columns:
    df1[j]=dummydata[(dummydata.index<=i)][j].dropna().iloc[-T:].iloc[::-1].reset_index(drop=True)
df1=df1.fillna(0).reset_index(drop=True)

Original Q&A

There are 1 answers

**XXavier** · Accepted Answer · 2020-09-12T12:06:51+00:00

XXavier On 12 September 2020 at 12:06 BEST ANSWER

Can you try this to see if this is faster?

dummydata.apply(lambda x: pd.Series(x.dropna().values)).fillna(0)

Then you can select only the first 100 rows using dummydata.loc[0:100, :]

TechQA.

Pandas - Drop NaN's per column and pad with 0 fast?

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in LSTM

Related Questions in ZERO-PADDING

Popular Questions

Popular Tags

Trending Questions