In SAS, Proc Standard allow users to standardise data into a certain mean and standard deviation by a certain group.
Here, I want to standarize age to a of mean = 0 and deviation =5 for each surname. How can I do so in Pandas df?
SAS code:
data mydata;
input surname $ name $ age ;
datalines;
Lim John 25
Lim David 100
Tan Mary 50
Tan Tom 30 ;
run;
PROC STANDARD MEAN=0 STD=5 DATA=mydata OUT=mydata11;
VAR age;
BY surname;
run;
SAS Output
surname name age
Lim John -3.535533906
Lim David 3.5355339059
Tan Mary 3.5355339059
Tan Tom -3.535533906
Following this answer from stats.stackexchange (Transform Data to Desired Mean and Standard Deviation), we can define a function that does this and apply it in a lambda function:
We can confirm this: