I want to create a script that processes files in a folder based on their names:.
- There are 2 types of files: (alpha & beta)
- All of the alpha files require the same manipulation (concatenation, filling blank cells, addition of a new column...)
- All of the beta files require the same manipulation (concatenation, filling blank cells, storage of certain variable...)
- Eventually, I'll need to merge alpha files with their corresponding beta files using a left merge.
My questions
- What's the best way to proceed with this task?
- Can we handle all alpha files simultaneously?
- Do we need to merge alpha files into one and beta files into another for further processing?
# IMPORT MODULES
import openpyxl as op
import pandas as pd
import xlsxwriter
import os
from xlsxwriter.workbook import Workbook
from openpyxl import Workbook, load_workbook
import glob
# MY FOLDER PATH
my_folder_path = r'C:\Users\machukovich\Desktop\working_folder'
list_of_files = os.listdir(my_folder_path)
print('This folder contains {len_folder} file(s): \n'.format(len_folder=len(list_of_files)))
for file_name in list_of_files:
print(file_name)
# DEFINING THE ALPHA FILES
alpha_files = glob.glob(r'C:\Users\Edu\Desktop\working_folder\*_alpha.xlsx')
for file_name in alpha_files:
print(file_name)
# DEFINING THE BETA FILES
beta_files = glob.glob(r'C:\Users\Edu\Desktop\working_folder\*_beta.xlsx')
for file_name in beta_files:
print(file_name)
# FROM HERE IDK HOW TO CONTINUE ( THIS OPTION THROWS ERROR )
dfs_alpha = pd.read_excel(alpha_files, sheet_name=None, skiprows=3)
For more clarity, the output for my folder is the following:
This folder contains 4 file(s):
first_file_alpha.xlsx
first_file_beta.xlsx
second_file_alpha.xlsx
second_file_beta.xlsx