pandas frequency table with missing values

Question

pandas frequency table with missing values

66 views Asked by Raphael Roth At 12 July 2023 at 14:20

I try to calculate the number of equal rows in a pandas dataframe (i.e. a frequency table) which is used to calculate the k-anonymity of a dataset

I have a special requirement regarding the counting of missing values : A missing value should count towards all other classes (as the missing value "could" be any value). In addition, the count of the record with missing values is the number of possible combinations regarding the missing values. Values should be taken as categorical

Given such a DataFrame, the count (below denoted as f_k) should look like

With pandas value_counts, I get

d = {
    'key1': [1,1,2,np.nan],
    'key2': [1,1,1,1],
    'key3': [3,np.nan,3,np.nan]
    }

df = pd.DataFrame(data=d)
df["key1"] = df["key1"].astype("Int64").astype('category')
df["key2"] = df["key2"].astype('Int64').astype('category')
df["key3"] = df["key3"].astype('Int64').astype('category')

df
.value_counts(dropna=False)
.reset_index()

Any idea how to achieve this in pandas?

Original Q&A

There are 1 answers

**Bluegirl FK** · Answer 1 · 2023-07-12T15:54:00+00:00

This works but time consuming:

import pandas as pd
import numpy as np


data = {
    'key1': [1, 1, 2, np.nan],
    'key2': [1, 1, 1, 1],
    'key3': [3, np.nan, 3, np.nan]
}
df = pd.DataFrame(data)

fk_lst = []
for index, row in df.iterrows():
    non_nan_columns = row[row.notna()].index.tolist()
    df = df[non_nan_columns]
    for col in df.columns:
        df[col] = df[col].fillna(row[col])
    count = df.value_counts(dropna=False).reset_index()
    count = int(count[['count']].iloc[0])
    fk_lst.append(count)
    df = pd.DataFrame(data)

df['f_k'] = fk_lst

TechQA.

pandas frequency table with missing values

There are 1 answers

Related Questions in PANDAS

Related Questions in ANONYMITY

Popular Questions

Popular Tags

Trending Questions