I am trying to calculate the average opening price for a stock, depending on different periods (week, month, year).
Here you can see a part of my df : My dataframe (987 rows for the complete df)
Firstly, I am trying to calculate the average opening price week by week. I found a solution, but it is unsustainable (it took my computer 5min to finish the calculations). Here it is :
def average_opening_and_closing_prices(df):
array = [0]
n = df["weekofyear"].count()
j=0
for i in range(0,n):
array[j] = array[j] + kdf["Open"][i]
if i != n-1 and kdf["weekofyear"][i] != kdf["weekofyear"][i+1]:
array.append(0)
j = j+1
for x in array:
print(str(x) + " ")
average_opening_and_closing_prices(AMAZON_df)
Could you help me to improve my solution (mainly on execution time) ? Also, for example, I would like to add a column, directly to my df, which contains the results for each week, instead of putting the results in an array.
I am not allowed to use pandas, I can only use pyspark and koalas.
[UPDATED: To include year into the calculation] As you are looking for average price for week (and year) and already added the weekofyear in data frame, panda's itself can do it for you. Just add a column for year and try
df.groupby(['year', 'weekofyear']).mean()
Sample below:Output