I have a dataset which contains URLs with publish date (YYYY-MM-DD), visits. I want to calculate benchmark (average) of visits for a complete year. Pages were published on different dates.....e. g. Weightage/contribution of 1st page published in Aug (with 10,000 visits) will be more as compare to 2nd page published in March (11,000).
Here is my dataset:
First step:
So first of all I want to add a column (i.e. time frame) in my data set which can calculate the time frame from the Publish date. For example: if the page was published on 2019-12-10, it can give the time frame/duration from my today's date, expected o/p: (Dec 2019, 9 Months). i.e. (Month Year on which the page was published, Total months from today)
Second step:
I want to normalize/rescale my data (visits) on the basis of calculated time frame column in step 1.
How can I calculate average/benchmark.
for the first step you can use following code: read dataframe
My example dataframe as below:
for normalizing the date: at first define a method to normalize just a date:
now apply the above method to all values of the date column:
The normalized dataframe will be:
but for the second step if there are multiple records per month you can do the following steps,
groupby
date and other columns you need then get mean of them:the result will be: