creating dictionary and adding number of occurrences from another dataset

88 views Asked by At

I need help writing a for loop to add number of times an element appears in a dataset to the value of a dictionary comprehension.

Here is the sample dataset:

salary_data = 
{'Age': '39', 'Education': 'E - Bachelors', 'Occupation': 'Adm-clerical', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'}
{'Age': '50', 'Education': 'E - Bachelors', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'}
{'Age': '38', 'Education': 'B - HS Diploma', 'Occupation': 'Handlers-cleaners', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'}
{'Age': '53', 'Education': 'A - No HS Diploma', 'Occupation': 'Handlers-cleaners', 'Relationship': 'Husband', 'Race': 'Black', 'Sex': 'Male', 'Target': '<=50K'}
{'Age': '28', 'Education': 'E - Bachelors', 'Occupation': 'Prof-specialty', 'Relationship': 'Wife', 'Race': 'Black', 'Sex': 'Female', 'Target': '<=50K'}
{'Age': '37', 'Education': 'F - Graduate Degree', 'Occupation': 'Exec-managerial', 'Relationship': 'Wife', 'Race': 'White', 'Sex': 'Female', 'Target': '<=50K'}
{'Age': '49', 'Education': 'A - No HS Diploma', 'Occupation': 'Other-service', 'Relationship': 'Not-in-family', 'Race': 'Black', 'Sex': 'Female', 'Target': '<=50K'}
{'Age': '52', 'Education': 'B - HS Diploma', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '>50K'}
{'Age': '31', 'Education': 'F - Graduate Degree', 'Occupation': 'Prof-specialty', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Female', 'Target': '>50K'}
{'Age': '42', 'Education': 'E - Bachelors', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '>50K'}

and a list of unique education levels was given:

unique_education_levels=
['A - No HS Diploma',
 'B - HS Diploma',
 'C - Some College',
 'D - Associates',
 'E - Bachelors',
 'F - Graduate Degree']

I need to create a dictionary called education_level_frequencies where the keys are the unique education levels and the values are the number of times the education level appears in the dataset.

So far I used a dictionary comprehension to create the dictionary with values of 0.

education_level_frequencies = [{level: 0} for level in unique_education_levels]

I'm trying to use a for loop to iterate through the dataset and add +1 to the education_level_frequencies keys to no avail.

for entry in salary_data:
    if entry['Education'] == education_level_frequencies:
        education_level_frequencies[entry] += 1
3

There are 3 answers

0
A. Guy On

With for loop, what you probably meant to wright was:

for entry in salary_data:
    if entry['Education'] in education_level_frequencies:
        education_level_frequencies[entry['Education'] += 1
0
Sash Sinha On

It looks like unique_education_levels is redundant since the keys in a dictionary have to be unique.

You could use collections.Counter or collections.defaultdict:

from collections import Counter, defaultdict

salary_data = [
    {'Age': '39', 'Education': 'E - Bachelors', 'Occupation': 'Adm-clerical', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'},
    {'Age': '50', 'Education': 'E - Bachelors', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'},
    {'Age': '38', 'Education': 'B - HS Diploma', 'Occupation': 'Handlers-cleaners', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Male', 'Target': '<=50K'},
    {'Age': '53', 'Education': 'A - No HS Diploma', 'Occupation': 'Handlers-cleaners', 'Relationship': 'Husband', 'Race': 'Black', 'Sex': 'Male', 'Target': '<=50K'},
    {'Age': '28', 'Education': 'E - Bachelors', 'Occupation': 'Prof-specialty', 'Relationship': 'Wife', 'Race': 'Black', 'Sex': 'Female', 'Target': '<=50K'},
    {'Age': '37', 'Education': 'F - Graduate Degree', 'Occupation': 'Exec-managerial', 'Relationship': 'Wife', 'Race': 'White', 'Sex': 'Female', 'Target': '<=50K'},
    {'Age': '49', 'Education': 'A - No HS Diploma', 'Occupation': 'Other-service', 'Relationship': 'Not-in-family', 'Race': 'Black', 'Sex': 'Female', 'Target': '<=50K'},
    {'Age': '52', 'Education': 'B - HS Diploma', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '>50K'},
    {'Age': '31', 'Education': 'F - Graduate Degree', 'Occupation': 'Prof-specialty', 'Relationship': 'Not-in-family', 'Race': 'White', 'Sex': 'Female', 'Target': '>50K'},
    {'Age': '42', 'Education': 'E - Bachelors', 'Occupation': 'Exec-managerial', 'Relationship': 'Husband', 'Race': 'White', 'Sex': 'Male', 'Target': '>50K'},
]

education_level_frequencies = Counter() # or defaultdict(int)
for entry in salary_data:
    education_level_frequencies[entry['Education']] += 1
education_level_frequencies = dict(education_level_frequencies)

# Equivalent one liner to above:
# education_level_frequencies = dict(Counter(entry['Education'] for entry in salary_data))

print(education_level_frequencies)

Or alternatively use the get() method if you want to just use the standard python dictionary :

education_level_frequencies = {}
for entry in salary_data:
    education_val = entry['Education']
    education_level_frequencies[education_val] = education_level_frequencies.get(
            education_val, 0) + 1

print(education_level_frequencies)

Output:

{'E - Bachelors': 4, 'B - HS Diploma': 2, 'A - No HS Diploma': 2, 'F - Graduate Degree': 2}
0
Mirza715 On

You can do something like this.

education_level_frequencies = defaultdict(int)
for data in salary_data:
    education = data['Education']
    education_level_frequencies[education] += int(education in unique_education_levels)

Here we will get all the frequencies of Education levels.