Get correct background color for merged cells in xlsx file

40 views Asked by At

I am currently trying to extract cells' background color from a xlsx file: image

I've tried two ways obtained from other stackoverflow posts:

1)

wb = load_workbook(excel_file, data_only = True)
sh = wb[wb.sheetnames[0]]
rows = sh.max_row 
cols = sh.max_column
bckg_color = np.empty(shape=(rows, cols), dtype=object)
for i in range(1,rows+1):
    for j in range(1,cols+1):
        cell = sh.cell(column=j, row=i)
        color_in_hex = cell.fill.start_color.index
        bckg_color[i-1, j-1] = str(color_in_hex)

pd.DataFrame(bckg_color)

sf = StyleFrame.read_excel(excel_file, read_style=True, use_openpyxl_styles=False, header=None)
bckg_color = StyleFrame(sf.applymap(lambda cell: cell.style.bg_color)).data_df.astype(str)

bckg_color

Both of them give the same result: df screenshot

The expected result was the same color on the 4th row, but it's not because of merged cells in that row. Is there a robust way (expect using bfill on the color dataframe) to get the colors so that the whole row would have the color I see on the screenshot? I suspect it can be done with getting information about merged cells from openpyxl, but I would not like to resort to that.

1

There are 1 answers

0
kb2136 On BEST ANSWER

I don't think you can avoid dealing with merged cells via openpyxl. But it's quite simple, just check whether the cell belongs to a merged range or not:

for i in range(1,rows+1):
    for j in range(1,cols+1):
        cell = None

        for mcl in sh.merged_cells:
            if mcl.min_row <= i <= mcl.max_row and mcl.min_col <= j <= mcl.max_col:
                cell = mcl.start_cell # that's where the background info is stored
                break

        if cell is None: cell = sh.cell(column=j, row=i)
        color_in_hex = cell.fill.start_color.index
        bckg_color[i-1, j-1] = str(color_in_hex)