I try to calculate the number of equal rows in a pandas dataframe (i.e. a frequency table) which is used to calculate the k-anonymity of a dataset
I have a special requirement regarding the counting of missing values : A missing value should count towards all other classes (as the missing value "could" be any value). In addition, the count of the record with missing values is the number of possible combinations regarding the missing values. Values should be taken as categorical
Given such a DataFrame, the count (below denoted as f_k) should look like
With pandas value_counts, I get
d = {
'key1': [1,1,2,np.nan],
'key2': [1,1,1,1],
'key3': [3,np.nan,3,np.nan]
}
df = pd.DataFrame(data=d)
df["key1"] = df["key1"].astype("Int64").astype('category')
df["key2"] = df["key2"].astype('Int64').astype('category')
df["key3"] = df["key3"].astype('Int64').astype('category')
df
.value_counts(dropna=False)
.reset_index()
Any idea how to achieve this in pandas?
This works but time consuming: