Pandas read_csv() with multiple delimiters not working

Question

Pandas read_csv() with multiple delimiters not working

79 views Asked by monicaJ At 17 February 2024 at 20:11

I have a csv file I'm trying to read as a pandas dataframe. I need to skip the first 19 rows of comments. I have headers in the 20th row and data in subsequent rows. Only issue is the header row starts with a '#' which shifts the headers over. The rest of the data are delimited with a space. For some reason doing sep=r'#|\s+' introduces two 'Unnamed' columns to the dataset.

Code:

df = pd.read_csv(df_path, skiprows=19, sep=r'#|\s+', engine='python', encoding='utf-8')

Output df:

Unnamed:0	Unnamed:1	Header1
Data1	Data2	Data3

Desired Output df:

Header1	Header2	Header3
Data1	Data2	Data3

How can I address the extra '#' in the header row without having this issue? I've also tried using

sep=r'[#|\s+]'

Original Q&A

There are 2 answers

**Atul sanwal** · Answer 1 · 2024-02-17T20:21:50+00:00

import pandas as pd

# Define the path to your CSV file
df_path = 'your_file.csv'

# Read the CSV file, skipping the first 19 rows as comments
df = pd.read_csv(df_path, skiprows=19, sep='\s+', engine='python', encoding='utf-8', comment='#')

# Rename the columns by splitting the first row
df.columns = df.columns.str.split().str[-1]

# Print the DataFrame
print(df)

**e-motta** · Answer 2 · 2024-02-17T20:48:32+00:00

# From OP
df = pd.read_csv(
    df_path, skiprows=19, sep=r"#|\s+", engine="python", encoding="utf-8"
)
# Get the column names, minus the unwanted ones
new_cols = df.columns[2:]
# Remove empty columns at the end of DataFrame
df = df.iloc[:, : len(df.columns) - 2]
# Rename the columns
df.columns = new_cols

This will result in:

  Header1 Header2 Header3
0   Data1   Data2   Data3

TechQA.

Pandas read_csv() with multiple delimiters not working

There are 2 answers

Related Questions in PANDAS

Related Questions in HEADER

Related Questions in DELIMITER

Related Questions in READ-CSV

Popular Questions

Trending Questions