Merge and combine time frames for excel file

38 views Asked by At

I am trying to merge around 15+ excel files into one huge file. There is also time mismatches and I want to combine some of the time frames into one. For example, if you have 10:15am - 10:30am and value A associated with this time in one sheet, and in another sheet you have 10am - 11am and the associated "balance" value B, then in the merge you can have 10:15am-10:30am values A and B, as the B falls into that time as well.

This is what I have so far which helps to merge the excel files into one. But now I am having trouble in matching the timeframes, please help! Thanks!

import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
files
path = os.getcwd()
files = os.listdir(path)

files_csv = [f for f in files if f.endswith('.csv')]

dfs = []

for f in files_csv:
    data = pd.read_csv(f)
    dfs.append(data)

df = pd.concat(dfs, ignore_index=True)

print(df)
1

There are 1 answers

1
Ipeleng Floyd Bela On

To solve your problem, you need to ensure that the CSV files are in your current working directory. Once you have the CSV files, you can use the following code to merge them and align the data based on overlapping time intervals

import os
import pandas as pd

# Get the current working directory
path = os.getcwd()

# Get all the csv files in the directory
files_csv = [f for f in os.listdir(path) if f.endswith('.csv')]

# Initialize an empty list to store the dataframes
dfs = []

# Read each csv file and append the dataframe to the list
for f in files_csv:
    data = pd.read_csv(f)
    # Convert the time columns to datetime
    data['start_time'] = pd.to_datetime(data['start_time'])
    data['end_time'] = pd.to_datetime(data['end_time'])
    dfs.append(data)

# Concatenate all dataframes
df = pd.concat(dfs, ignore_index=True)

# Sort the dataframe by start_time
df = df.sort_values('start_time')

# Group the dataframe by overlapping time intervals and aggregate the values
df['interval'] = (df['start_time'].shift() != df['start_time']).cumsum()
df = df.groupby(['interval', 'start_time', 'end_time']).sum().reset_index()

# Print the merged dataframe
print(df)

Replace 'start_time' and 'end_time' with the actual column names in your CSV files that represent the start and end times. Feel free to ask more questions, i hope this helps </> Code By Ipeleng Floyd Bela