Dynamically create, merge & save dataframes in a for loop

26 views Asked by At

I have two different datasets. One dataset describes levels and a location (contains 4 files). The second dataset describes technologies and a location (contains 3 files).

import os 
import pandas as pd 
import glob 


technology = glob.glob("C:\\path\\*.xlsx", recursive = True)
level = glob.glob("C:\\path\\*.xlsx", recursive = True)
 
d = {}

for level, technology in zip (level, technology):
    d[level technology] = pd.merge(technology, level, how= "inner",left_on=["Location"],right_on=["Location"])
    d.to_excel(d[level technology]+ '.xlsx')           
  1. With d ={} I try to create a dataframe, which I can rename.
  2. With the for loop I try to merge every single technology file with every single level file based on the columns Location. 3.To save the 12 files based of a merge of the technology and the level based on their orginal file names....

Am I even use right method ? At the moment I get the following error message: TypeError: Can only merge Series or DataFrame objects, a <class 'str'> was passed

1

There are 1 answers

0
DataSciRookie On BEST ANSWER

The issue you're encountering stems from a misunderstanding of how pandas.merge and file handling work in this context. Your technology and level variables are lists of file paths (strings), not DataFrame objects. You need to load these files into pandas DataFrames before you can merge them.

import os
import pandas as pd
import glob

technology_files = glob.glob("C:\\path\\technology*.xlsx", recursive=True)
level_files = glob.glob("C:\\path\\level*.xlsx", recursive=True)

output_dir = "C:\\path\\merged_files\\"
os.makedirs(output_dir, exist_ok=True)

merged_files = {}

for technology_path in technology_files:
    for level_path in level_files:
        # Load the current technology and level files into DataFrames
        technology_df = pd.read_excel(technology_path)
        level_df = pd.read_excel(level_path)
        
        # Merge on 'Location'
        merged_df = pd.merge(technology_df, level_df, how="inner", on="Location")
        
        # Create a unique key/name for the dictionary and the output file
        technology_filename = os.path.splitext(os.path.basename(technology_path))[0]
        level_filename = os.path.splitext(os.path.basename(level_path))[0]
        merged_key = f"{technology_filename}_{level_filename}"
        
        # Store the merged DataFrame in the dictionary
        merged_files[merged_key] = merged_df
        
        # Save the merged DataFrame to an Excel file
        output_filepath = os.path.join(output_dir, f"{merged_key}.xlsx")
        merged_df.to_excel(output_filepath, index=False)

print("Merging and saving completed.")