Why is concat reformatting my headings?

60 views Asked by At

I have sorted a CSV file as I want it and appended a column to sort my data properly. However, in using concat (I think this is where the issue is, anyway) The output CSV file has been changed to (0L, 'HeadingTitle'). I just want it to be HeadingTitle.

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

data = pd.DataFrame.from_csv('MYDATA.csv')
byqualityissue = data.groupby(["CompanyName","QualityIssue"]).size()
df = pd.DataFrame(byqualityissue)

formatted = df.unstack(level=-1)
formatted[np.isnan(formatted)] = 0

includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1),columns=['Total'])],axis=1)
sorted = includingtotals.sort_index(by=['Total'], ascending=[False])
#del sorted['Total']
sorted.to_csv('byqualityissue.csv')

Where the output headings are:

CompanyName, (0L, 'Equipment'), (0L, 'User'), (0L, 'Neither'), Total

How do I modify this so that I only have the heading titles?

Edit: If I print sorted.columns the output is

Index([(0, u'Equipment), (0, u'User'), (0, u'Neither'), u'Total'], dtype='object')
1

There are 1 answers

0
mcwitt On BEST ANSWER

In the line

df = pd.DataFrame(byqualityissue)

you don't give the column a name, so it takes the default value 0. Then when you call unstack,

formatted = df.unstack(level=-1)

the result has hierarchical columns with 0 in the first level. To fix this you can substitute the previous line with

formatted = df.unstack(level=-1)[0]