Customizing the separator in pandas read_csv

Question

Customizing the separator in pandas read_csv

86.3k views Asked by Peaceful At 20 December 2016 at 04:53

I am reading many different data files into various pandas dataframes. The columns in these datafiles are separated by spaces. However, for each file, the number of spaces is different (for some of them, there is only one space, for others, there are two spaces and so on). Thus, every time I import the file, I have to manually go to that file and see the number of spaces that have been used and give those many number of spaces in sep:

import pandas as pd
df = pd.read_csv('myfile.dat', sep = '    ')

Is there any way I can tell pandas to assume "any number of spaces" as the separator? Also, is there any way I can tell pandas to use either tab (\t) or spaces as the separator?

Original Q&A

There are 4 answers

piRSquared On 20 December 2016 at 05:04

You can also use the parameter skipinitialspace=True which skips the leading spaces after any delimiter.

nlahri On 03 July 2017 at 12:00

You can directly use delim_whitespace:

import pandas as pd
df = pd.read_csv('myfile.dat', delim_whitespace=True )

The argument delim_whitespace controls whether or not whitespace (e.g. ' ' or ' ') will be used as separator. See pandas.read_csv for details.

Dustin Williams On 10 April 2018 at 17:21

One thing I found is if you use a unsupported separator. Pandas/Dask will have to use the Python engine instead of the C engine. This is a good deal slower.

**Ted Petrou** · Accepted Answer · 2016-12-20T04:59:42+00:00

Ted Petrou On 20 December 2016 at 04:59 BEST ANSWER

Yes, you can use a simple regular expression like sep='\s+' to denote one or more spaces.

TechQA.

Customizing the separator in pandas read_csv

There are 4 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SEPARATOR

Popular Questions

Trending Questions