I have a piece of code that manipulates data of a txt file and writes a new csv file with the manipulated data. The original file does not have headers and column 1 includes unwanted data.
The code does 3 things:
- Removes two of the 4 columns
- Adds column headers
- Changes the content of one of the remaining columns to remove characters around the desired numbers (basically takes out prefix and suffix around the numbers).
import pandas as pd
file = pd.read_csv("example.txt", usecols=[0,1]) #to only get the first 2 columns
headerList = ['store', 'sku'] #name headers
file.to_csv("test.csv", header=headerList, index=False) #create new csv file headers
file = pd.read_csv("test.csv") #read new file including headers
file['store']=file['store'].str.split('R ').str[-1] #remove chars before str num
file['store']=file['store'].str.split(' -').str[0] #remove chars after str num
file.to_csv("test.csv", index=False) #updates the header file
This is easy to do with one file at a time, but I would like to apply this code to all files within a zip file that are formatted the same way, but have different names and data. Is there a way to maybe create a loop that goes through each file within the zip to run this code and create a new zip file with the modified data?
From the
read_csvdocs, you can pass in a filename or buffer (that is, a file-like object). Thezipfile.ZipFile.openwill open a file contained in a zipfile. Put those together and you can enumerate the zipfile, processing each file. Also, you can apply your own header to the data as you read it, so there is no need for an intermediate file