Print data of csv file after specific string is found

64 views Asked by At

I am quite new to Python and eager to learn. I want to open and use data from a csv file, but only use the data (four columns of values, seperated by ",") after a specific string (s,s,N,s). The string ssNs is not always in the same row, so I cannot use the rownumber.

CSV File

Can you help me on how to use the data? My current code so far looks like the following;

import pandas as pd
import math
import sys  
data = pd.read_csv(r'C:\Users\User\Documents\pythonProject\filename.csv', engine="python",sep=',',encoding='latin-1')

if len(row) > 1:
   if row[0].startswith('s,s,N,s'):
      print(row)
1

There are 1 answers

0
Achille G On

Here is a working example I've used for a personal project. Don't mind the chardet module, it is only there to detect the encodage

import chardet
def read_file(path, keyword, delim=";", encoding="latin-1"):
    """
     Read file and create data frame. This function is used to read data from csv file
     
     @param path - path to file to read
     @param keyword - first row starting with keyword to look for
     @param delim - delimiter to use for reading csv file default is space
     @return data frame or 0 if file not found or error
     """
    num = 0
    with open(path) as f:
        lines = f.readlines()        
        #get list of all possible lins starting by first_col
        # Find the first row starting with keyword in the list of lines
        for i in range(10):
            # Find the first column in the line
            if keyword in lines[i]:
                encoding = chardet.detect(str.encode(lines[i]))    
                num = i
                break
    if num < 1:
        return
    encoding = encoding["encoding"]
    if encoding == "ascii":
        encoding = "latin-1"
    elif encoding == "utf-8":
        delim = ";"
    try:
        df = pd.read_csv(path, delimiter=delim, skiprows=num, on_bad_lines="skip", encoding=encoding)
    except:
        df = pd.read_csv(path, delimiter="\t", skiprows=num, on_bad_lines="skip", encoding="latin-1")

Just use this function with the path of your filename, your keyword here would be 's,s,N,s'