I would like to add significance stars (p-values) to the autocorrelations in a df (by column).
How can I incorporate the significance stars next to each autocorrelation coefficient?
from statsmodels.tsa.stattools import acf
def autocorr_with_asterisks(df):
"""
Calculate autocorrelation coefficients and add significance asterisks.
Parameters:
df (DataFrame): Input DataFrame with time series data.
Returns:
DataFrame: DataFrame containing autocorrelation coefficients with significance asterisks.
"""
autocorr_df = pd.DataFrame
asterisks = []
for col in df.columns:
acf_vals = acf(df[col],nlags=9, qstat=True)
autocorr_df[col] = acf_vals[0]
col_asterisks = []
for p_val in acf_vals[2]:
if p_val < 0.01:
col_asterisks.append('***')
elif p_val < 0.05:
col_asterisks.append('**')
elif p_val < 0.1:
col_asterisks.append('*')
else:
col_asterisks.append('')
asterisks.append(col_asterisks)
autocorr_df_with_asterisks = autocorr_df.astype(str) + np.array(asterisks).T
return autocorr_df_with_asterisks
Sample Data:
data = np.zeros((200, 5))
drift = 0.1
for col in range(5):
for i in range(1, 200):
data[i, col] = data[i - 1, col] + drift + np.random.randn()
#df follows an AR(1) process
df = pd.DataFrame(data, columns=['Column_1', 'Column_2', 'Column_3', 'Column_4', 'Column_5'])
#df follows an AR(1) process
You weren't far off in your solution: Here is how you can solve this. I created a new dataframe to show one example. Note also that I put the number of lags in the definition of the function so that you may change it if you need.
which results in
In this case, just one asterisk but if you have better actual data, it must produce what you want.