Removing space in dataframe python

Question

Removing space in dataframe python

22.5k views Asked by jenryb At 10 June 2015 at 17:30

I am getting an error in my code because I tried to make a dataframe by calling an element from a csv. I have two columns I call from a file: CompanyName and QualityIssue. There are three types of Quality issues: Equipment Quality, User, and Neither. I run into problems trying to make a dataframe df.Equipment Quality, which obviously doesn't work because there is a space there. I want to take Equipment Quality from the original file and replace the space with an underscore.

input:

Top Calling Customers,         Equipment Quality,    User,    Neither,
Customer 3,                      2,           2,        0,
Customer 1,                      0,           2,        1,
Customer 2,                      0,           1,        0,
Customer 4,                      0,           1,        0,

Here is my code:

import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3

# Get the data.
data = pd.DataFrame.from_csv('MYDATA.csv')   
# Group the data by calling CompanyName and QualityIssue columns.
byqualityissue = data.groupby(["CompanyName", "QualityIssue"]).size() 
# Make a pandas dataframe of the grouped data.
df = pd.DataFrame(byqualityissue) 
# Change the formatting of the data to match what I want SpiderPlot to read.
formatted = df.unstack(level=-1)[0]  
# Replace NaN values with zero.
formatted[np.isnan(formatted)] = 0 
includingtotals = pd.concat([formatted,pd.DataFrame(formatted.sum(axis=1), 
                             columns=['Total'])], axis=1)
sortedtotal = includingtotals.sort_index(by=['Total'], ascending=[False])
sortedtotal.to_csv('byqualityissue.csv')

This seems to be a frequently asked question and I tried lots of the solutions but they didn't seem to work. Here is what I tried:

with open('byqualityissue.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    return [[x.strip() for x in row] for row in reader]
    sentence.replace(" ", "_")

And

sortedtotal['QualityIssue'] = sortedtotal['QualityIssue'].map(lambda x: x.rstrip(' '))

And what I thought was the most promising from here http://pandas.pydata.org/pandas-docs/stable/text.html:

formatted.columns = formatted.columns.str.strip().str.replace(' ', '_')

but I got this error: AttributeError: 'Index' object has no attribute 'str'

Thanks for your help in advance!

Original Q&A

There are 2 answers

JBWhitmore On 22 November 2015 at 01:13

As I understand your question, the following should work (test it out with inplace=False to see how it looks first if you want to be careful):

sortedtotal.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)

And if you have white space surrounding the column names, like: "This example "

sortedtotal.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)

which strips leading/trailing whitespace, then converts internal spaces to "_".

**Alexander** · Accepted Answer · 2015-06-10T19:35:14+00:00

Alexander On 10 June 2015 at 19:35 BEST ANSWER

Try:

formatted.columns = [x.strip().replace(' ', '_') for x in formatted.columns]

TechQA.

Removing space in dataframe python

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in WHITESPACE

Related Questions in DATAFRAME

Related Questions in STRIP

Popular Questions

Trending Questions