Removing space in dataframe python

22.5k views Asked by At

I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.

input:

Top Calling Customers,         Equipment Quality,    User,    Neither,
Customer 3,                      2,           2,        0,
Customer 1,                      0,           2,        1,
Customer 2,                      0,           1,        0,
Customer 4,                      0,           1,        0,

Here is my code:

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')   
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size() 
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue) 
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]  
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0 
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1), 
                             columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')

This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:

with open('byqualityissue.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]
    sentence.replace(" ", "_")

And

sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' ')) 

And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:

formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')

but I got this error: AttributeError: 'Index' object has no attribute 'str'

Thanks for your help in advance!

2

There are 2 answers

0
Alexander On BEST ANSWER

Try:

formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]
0
JBWhitmore On

As I understand your question, the following should work (test it out with inplace=False to see how it looks first if you want to be careful):

sortedtotal.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)

And if you have white space surrounding the column names, like: "This example "

sortedtotal.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

which strips leading/trailing whitespace, then converts internal spaces to "_".