How do I copy/paste merged cells from excel via python?

403 views Asked by At

I tried using openpyxl and had a lot of trouble, now im trying pandas and an getting tons of tracebacks. Seems like copy pasting merged cells is hard and very specific, either that or im just a nooby. Can anyone help with tips or solutions?

Heres my code with pandas

import os
import pandas as pd

# Set the paths for the interim and finals folders
interim_folder = r'C:\Users\Jake_\Downloads\Code\Interims'
finals_folder = r'C:\Users\Jake_\Downloads\Code\Finals'

# Get a list of all files in the interim folder
interim_files = os.listdir(interim_folder)

# Iterate over each file in the interim folder
for file_name in interim_files:
    # Check if the file is an Excel file
    if file_name.endswith('.xlsx'):
        interim_file_path = os.path.join(interim_folder, file_name)
        finals_file_path = os.path.join(finals_folder, file_name)

        # Read the interim Excel file using pandas with the xlrd engine
        interim_data = pd.read_excel(interim_file_path, sheet_name='Financial Report Interim Mar 10', engine='xlrd')

        # Copy the required data to a new DataFrame
        finals_data = pd.DataFrame()
        finals_data['Column B'] = interim_data.iloc[3:8, 1]  # Copy B4:B8
        finals_data['Column D'] = interim_data.iloc[5:201, 3]  # Copy D6:D200
        finals_data['Column E'] = interim_data.iloc[5:201, 4]  # Copy E6:E200
        finals_data['Column F'] = interim_data.iloc[5:201, 5]  # Copy F6:F200
        finals_data['Column G'] = interim_data.iloc[5:201, 6]  # Copy G6:G200

        # Write the data to the finals Excel file
        writer = pd.ExcelWriter(finals_file_path, engine='openpyxl')
        finals_data.to_excel(writer, sheet_name='Final Financial Report July 7', index=False)
        writer.save()

and heres the tracebacks i get

What im trying to do: I have about 100 files in a folder called interims - and each of those files has a corresponding file thats literally the same name a folder called finals. I want to copy a specific few cells from a specific worksheet called "Financial Report Interim Mar 10" from each of the files in the interim folder and paste into a worksheet called "Final Financial Report July 7" in the corresponding files in the finals folder. Have been trying and failing for like 6 hours.

Thanks

Tried using openpyxl and now pandas, still no luck at all.

1

There are 1 answers

8
moken On

It looks like the excel workbook you have provided is the one being written to i.e. the 'Final' sheet, so not sure what the interim sheet looks like.
However to address the actual issue in the Openpyxl only code from the traceback you provided;

Traceback (most recent call last):
  File "C:\Users\Jake_\Downloads\step5.py", line 34, in <module>
    merged_range = interim_worksheet.merged_cells[cell.coordinate]
TypeError: 'MultiCellRange' object is not subscriptable

You want to obtain the top left cell of the merged cells. That cannot be obtained directly from merged cells, change that section

for cell in row:
    if cell.coordinate in interim_worksheet.merged_cells:
        merged_range = interim_worksheet.merged_cells[cell.coordinate]
        top_left_cell = merged_range.split(':')[0]
        finals_worksheet[cell.coordinate].value = interim_worksheet[top_left_cell].value
    else:

to something like this

for cell in row:
    if cell.coordinate in interim_worksheet.merged_cells:
        top_left_cell = ''
        for item in interim_worksheet.merged_cells.ranges:
            top_left_cell = item.coord.split(':')[0]
        finals_worksheet[cell.coordinate].value = interim_worksheet[top_left_cell].value
else:

With the above code example if the cells D6 and D7 were merged then top_left_cell will be 'D6'.

But as noted in the comment with the main loop;

for row in interim_worksheet.iter_rows(min_row=6, max_row=200, min_col=4, max_col=7):
    for cell in row:

you will be trying to process non existent cells. Using the example of cells D6 and D7 being merged, on the first row loop the cells will be D6, E6, F6 etc, then on the second loop the first cell will be the non existant D7 so at that point the code needs to determine that or it will break again.

The same issue can also occur in the opposite way. If D7 is a valid cell in the interim sheet but merged on the Final sheet (and top left is 'D6') then the line

finals_worksheet[cell.coordinate].value = cell.value

would also fail since you cannot write to it in the Finals sheet. It would return the error;

AttributeError: 'MergedCell' object attribute 'value' is read-only