Age range to numerical values to calcutate Correlation of CD consumption with age range

Question

Age range to numerical values to calcutate Correlation of CD consumption with age range

20 views Asked by lelelel At 27 March 2024 at 22:36

I did sort the values. But the problem is 'до 25' (up to 25). How can i change it into '0-25' and calculate correlation coefficient of age group and overall rating.

Some of my data is below

Age group	Overall rating
65 and older	38.45
55-64	17.66
up to 25	46.56
45-54	24.95
35-44	33.54
25-34	37.21

Original Q&A

There are 1 answers

**Jamie** · Accepted Answer · 2024-03-28T03:36:07+00:00

Below is how you can do what you ask. I converted your age categories to mean age because correlation requires two numeric values; a category will not work for correlation. There are some other problems with your data. It is unclear what the 65 and older class really is numerically. I made it 65-100 but that may not be the case. You also have your categories set at 25-34 for example. It should be 25-35 because 25-35 does not contain 35 it contains 25, 26, 27, 28, 29, 30, 31, 32, 33 and 34 which is what I think you are trying to achieve. I did not change this but you should change it if that is what you are trying to achieve.

import pandas as pd
from scipy.stats import pearsonr
import warnings
warnings.filterwarnings("ignore")

Agelst=['65 and older','55-64','up to 25','45-54','35-44','25-34']
Ratelst=[38.45,17.66,46.56,24.95,33.54,37.21]

df=pd.DataFrame()
df['Age_Group']=Agelst
df['Overal_Rating']=Ratelst

display(df)

#Change 'up to 25' to '0-25'
df.replace('up to 25', '0-25',inplace=True)
df.replace('65 and older', '65-100',inplace=True)

display(df)

#You will need a numeric age to use for correlation.  We can develop one from the strings in your 'Age_Group'
loweragelst=[]
upperagelst=[]
for i in range(len(df)):
    loweragelst.append(int(((df.iloc[i]['Age_Group']).split('-'))[0]))
    upperagelst.append(int(((df.iloc[i]['Age_Group']).split('-'))[1]))

df['Lower_Age']=loweragelst
df['Upper_Age']=upperagelst

#Sort the df
df.sort_values(by=['Lower_Age'], ascending=True,inplace=True)
display(df)

#Add a mean age column to use for correlation
df['Mean_Age']=(df['Lower_Age']+df['Upper_Age'])/2

display(df)

#Calculate Pearson's Correlation
X=df['Mean_Age']
Y=df['Overal_Rating']
PCor= pearsonr(X, Y)
print(PCor)

The resulting df and correlation are:

Age_Group   Overal_Rating
0   65 and older    38.45
1   55-64   17.66
2   up to 25    46.56
3   45-54   24.95
4   35-44   33.54
5   25-34   37.21
    Age_Group   Overal_Rating
0   65-100  38.45
1   55-64   17.66
2   0-25    46.56
3   45-54   24.95
4   35-44   33.54
5   25-34   37.21
    Age_Group   Overal_Rating   Lower_Age   Upper_Age
2   0-25    46.56   0   25
5   25-34   37.21   25  34
4   35-44   33.54   35  44
3   45-54   24.95   45  54
1   55-64   17.66   55  64
0   65-100  38.45   65  100
    Age_Group   Overal_Rating   Lower_Age   Upper_Age   Mean_Age
2   0-25    46.56   0   25  12.5
5   25-34   37.21   25  34  29.5
4   35-44   33.54   35  44  39.5
3   45-54   24.95   45  54  49.5
1   55-64   17.66   55  64  59.5
0   65-100  38.45   65  100     82.5

PearsonRResult(statistic=-0.4489402583278369, pvalue=0.37183097344063043)

TechQA.

Age range to numerical values to calcutate Correlation of CD consumption with age range

There are 1 answers

Related Questions in PYTHON

Related Questions in SORTING

Related Questions in MACHINE-LEARNING

Related Questions in HEATMAP

Related Questions in CORRELATION

Popular Questions

Trending Questions