I've selected a set of records from a larger data set that represents a specific organization. I need to iterate over this data and produce a pivot table that compares the demographic information to how that demographic answered the survey questions. There are 5 demographic questions, each with multiple choices for categorization. A pivot table provides a count of how each demographic category answered a survey question. I need to produce 5 pivot tables for each of the remaining 17 questions and store them to a dataframe. I'll stack the resulting dataframes later. The problem I'm having is trying to iterate over the main original dataframe, replacing the two column names for the crosstab. I can easily do one manually with the following code:
table18 = pd.pivot_table(org1_df, values='Business size?', index=['Business type?'],
columns=['Survey Question 1'],
aggfunc=['count'], fill_value=0)
Or
cross = pd.crosstab(org1_df["Business type?"], org1_df["Survey Question 2"])
cross
Here's the pivot table I was able to create:
The point is to determine how a small business answered the survey questions compared with a large business' answers. And so on for the other dozen+ categories we have on this survey.`
| Respondents | Business Type? | Business Size? | Survey Question 1 | Survey Question 2 |
|---|---|---|---|---|
| Resp_1 | 2 | 2 | 3 | 2 |
| Resp_2 | 2 | 1 | 4 | 3 |
Sample data above.
I stored the demographic column names in a list:
column_names = ["Business type?", "Business size?"]
then added those to a dictionary:
d = {}
for name in column_names:
d[name] = pd.DataFrame()
I tried a dictionary comprehension based on a similar question and answer:
dfs = {i[1]: pd.pivot_table(org1_df, index=i, values='Business type?',
aggfunc='count', fill_value=0) for i, org1_df in zip(column_names, d)}
I received this error: AttributeError: 'str' object has no attribute 'columns'
I would greatly appreciate help on this task. As an aside, I'm not sure why when I leave values='Business type?' it doesn't matter for the resulting pivot table. The answers I'm getting matched what I get in Excel.
I'm working my way to a diverging stacked bar chart using matplotlib, but I have to get the data organized first.
UPDATE: org1_df output Based on the mock data above, I would produce 4 pivot tables from that.
- Survey Question 1 - a count of the answers for "Business type?"
- Survey Question 1 - a count of the answers for "Business size?"
- Survey Question 2 - a count of the answers for "Business type?"
- Survey Question 4 - a count of the answers for "Business size?"