Is it possible to apply a code to all txt files within a zip file in python?

60 views Asked by At

I have a piece of code that manipulates data of a txt file and writes a new csv file with the manipulated data. The original file does not have headers and column 1 includes unwanted data.

The code does 3 things:

  1. Removes two of the 4 columns
  2. Adds column headers
  3. Changes the content of one of the remaining columns to remove characters around the desired numbers (basically takes out prefix and suffix around the numbers).
import pandas as pd
file = pd.read_csv("example.txt", usecols=[0,1]) #to only get the first 2 columns 
 
headerList = ['store', 'sku'] #name headers
 
file.to_csv("test.csv", header=headerList, index=False) #create new csv file headers
 
file = pd.read_csv("test.csv") #read new file including headers
 
file['store']=file['store'].str.split('R ').str[-1] #remove chars before str num
file['store']=file['store'].str.split(' -').str[0] #remove chars after str num
 
 
file.to_csv("test.csv", index=False) #updates the header file

This is easy to do with one file at a time, but I would like to apply this code to all files within a zip file that are formatted the same way, but have different names and data. Is there a way to maybe create a loop that goes through each file within the zip to run this code and create a new zip file with the modified data?

1

There are 1 answers

0
tdelaney On BEST ANSWER

From the read_csv docs, you can pass in a filename or buffer (that is, a file-like object). The zipfile.ZipFile.open will open a file contained in a zipfile. Put those together and you can enumerate the zipfile, processing each file. Also, you can apply your own header to the data as you read it, so there is no need for an intermediate file

import pandas as pd
import zipfile

with zipfile.ZipFile("example.zip") as zippy:
    for filename in zippy.infolist():
        df = pd.read_csv(zippy.open(filename), usecols=[0,1], 
                header=0, names=['store', 'sku'])
        print(df)